SlideShare a Scribd company logo
1 of 19
Download to read offline
1	
  
	
  
QUALITY OF
LIFE IN G 20
COUNTRIES
	
  
	
  
	
  
Matteo	
  Biagini	
  
	
  
	
  
	
  
	
  
	
  
	
  
	
  
	
  
2	
  
	
  
Index:
p.3 Introduction
p.5 Correlation Matrix
p.6 Regression model
p.7 Factor Analysis
p.14 Cluster Analysis
p.19 Conclusion
	
  
	
  
	
  
	
  
	
  
	
  
	
  
	
  
	
  
	
  
	
  
	
  
	
  
	
  
	
  
	
  
	
  
	
  
3	
  
	
  
QUALITY	
  OF	
  LIFE	
  IN	
  G	
  20	
  COUNTRIES	
  
	
  
INTRODUCTION	
  
The	
  aim	
  of	
  this	
  research	
  is	
  to	
  investigate	
  how	
  the	
  quality	
  of	
  life	
  in	
  G20	
  countries	
  is	
  related	
  to	
  some	
  
indicators	
  of	
  	
  life	
  quality.	
  
Considering	
  quality	
  of	
  life	
  	
  we	
  refer	
  to	
  the	
  general	
  well-­‐being	
  of	
  individuals	
  and	
  societies.	
  The	
  term	
  
is	
  used	
  in	
  a	
  wide	
  range	
  of	
  contexts,	
  including	
  the	
  fields	
  of	
  international	
  development,	
  healthcare,	
  
and	
  politics.	
  Standard	
  indicators	
  of	
  the	
  quality	
  of	
  life	
  include	
  not	
  only	
  wealth	
  and	
  employment,	
  but	
  
also	
  the	
  built	
  environment,	
  physical	
  and	
  mental	
  health,	
  education,	
  recreation	
  and	
  leisure	
  time,	
  
and	
  social	
  belonging.	
  
So	
  among	
  a	
  variety	
  of	
  indicators	
  we	
  have	
  chosen	
  8.	
  
Life	
  expectancy	
  is	
  a	
  key	
  indicator	
  of	
  the	
  general	
  health	
  of	
  the	
  population.	
  Improvements	
  in	
  overall	
  
life	
  expectancy	
  reflect	
  improvements	
  in	
  social	
  and	
  economic	
  conditions,	
  lifestyle,	
  access	
  to	
  health	
  
services	
  and	
  medical	
  advances.	
  This	
  indicator	
  uses	
  estimated	
  life	
  expectancy	
  at	
  birth.	
  
CO2	
   emissions	
   and	
   terrestrial	
   protected	
   areas	
   are	
   indicators	
   that	
   concern	
   how	
   natural	
  
environment	
  supports	
  its	
  people,	
  economy	
  and	
  culture.	
  As	
  the	
  population	
  grows	
  and	
  economic	
  
activity	
  increases,	
  more	
  demands	
  are	
  placed	
  on	
  the	
  natural	
  environment.	
  Environmental	
  issues	
  
impact	
  on	
  economic	
  and	
  public	
  health	
  issues.	
  In	
  fact	
  another	
  indicator	
  that	
  we	
  	
  have	
  chosen	
  is	
  
health	
  expenditure	
  per	
  capita	
  that	
  is	
  very	
  related	
  with	
  previous	
  indicators.	
  
Urban	
   population	
   refers	
   to	
   population	
   growth	
   and	
   change	
   in	
   cities	
   impact	
   on	
   the	
   relationships	
  
people	
  have	
  with	
  others	
  and	
  their	
  sense	
  of	
  belonging	
  to	
  an	
  area.	
  
The	
   concept	
   of	
   community	
   is	
   fundamental	
   to	
   people’s	
   overall	
   quality	
   of	
   life	
   and	
   sense	
   of	
  
belonging.	
  In	
  fact	
  we	
  have	
  chosen	
  subsidies	
  and	
  other	
  transfers	
  like	
  an	
  indicator	
  of	
  quality	
  of	
  life	
  
because	
   these	
   are	
   an	
   instrument	
   with	
   whom	
   government	
   reassign	
   wealth	
   among	
   people	
   of	
   a	
  
country.	
  	
  	
  
Public	
  expenditure	
  on	
  	
  education	
  provides	
  an	
  insight	
  into	
  the	
  knowledge	
  and	
  skills	
  of	
  residents	
  and	
  
how	
  they	
  can	
  apply	
  these	
  to	
  improve	
  their	
  quality	
  of	
  life.	
  Educational	
  achievement	
  is	
  essential	
  for	
  
effective	
  participation	
  in	
  society.	
  
The	
   last	
   indicator	
   is	
   unemployment:	
   a	
   reduction	
   of	
   this	
   indicator	
   helps	
   stimulate	
   further	
  
opportunities	
  for	
  economic	
  growth	
  and	
  development	
  within	
  a	
  community	
  or	
  nation.	
  
4	
  
	
  
The	
   considered	
   countries	
   (G20	
   countries	
   that	
   are	
   the	
   richest	
   one	
   in	
   the	
   world)	
   are:	
   Canada,	
  
France,	
  Germany,	
  Japan,	
  Italy,	
  Russian	
  Federation,	
  United	
  States,	
  United	
  Kingdom,	
  Brazil,	
  China,	
  
South	
  Africa,	
  Australia,	
  Saudi	
  Arabia,	
  South	
  Korea,	
  Indonesia,	
  Mexico,	
  Turkey,	
  Spain,	
  Netherlands.	
  
The	
  source	
  of	
  data	
  is	
  the	
  World	
  data	
  Bank	
  in	
  the	
  section	
  of	
  World	
  Development	
  indicators(WDI).	
  
The	
  year	
  chosen	
  to	
  extract	
  data	
  is	
  2008.	
  
The	
  specific	
  software	
  used	
  on	
  this	
  project	
  are:	
  
·∙ Gretl(regression)	
  
·∙ R-­‐Project	
  (factor	
  and	
  cluster	
  analysis)	
  
·∙ Microsoft	
  Excel	
  (data	
  matrix	
  elaboration,	
  before	
  and	
  after	
  using	
  R)	
  
	
  
We	
  have	
  numbered	
  X	
  from	
  1	
  to	
  8	
  in	
  relation	
  to	
  any	
  variable:	
  
	
  
·∙ X1=CO2	
  emissions	
  (kg	
  per	
  2000	
  US$	
  of	
  GDP)	
  
·∙ X2=Urban	
  population	
  
·∙ X3=Health	
  expenditure	
  per	
  capita	
  (current	
  US$)	
  
·∙ X4=Life	
  expectancy	
  at	
  birth,	
  total	
  (years)	
  
·∙ X5=Unemployment,	
  total	
  (%	
  of	
  total	
  labor	
  force)	
  
·∙ X6=Public	
  spending	
  on	
  education,	
  total	
  (%	
  of	
  GDP)	
  
·∙ X7=Subsidies	
  and	
  other	
  transfers	
  (%	
  of	
  expense)	
  
·∙ X8=Terrestrial	
  protected	
  areas	
  (%	
  of	
  total	
  land	
  area)	
  
	
  
	
  
5	
  
	
  
	
  
	
  
Correlation	
  matrix	
  
	
  
X1	
   X2	
   X3	
   X4	
   X5	
   X6	
   X7	
   X8	
   	
  
1,0000	
   0,4108	
   -­‐0,6168	
   -­‐0,7387	
   0,2370	
   -­‐0,4123	
   -­‐0,0290	
   -­‐0,2151	
   X1	
  
	
   1,0000	
   -­‐0,2571	
   -­‐0,2300	
   -­‐0,2166	
   -­‐0,5982	
   -­‐0,1159	
   -­‐0,0277	
   X2	
  
	
   	
   1,0000	
   0,6361	
   -­‐0,2003	
   0,4932	
   0,3154	
   0,1806	
   X3	
  
	
   	
   	
   1,0000	
   -­‐0,6507	
   0,2132	
   0,2230	
   0,2105	
   X4	
  
	
   	
   	
   	
   1,0000	
   0,0424	
   -­‐0,0984	
   -­‐0,1525	
   X5	
  
	
   	
   	
   	
   	
   1,0000	
   0,0872	
   0,2719	
   X6	
  
	
   	
   	
   	
   	
   	
   1,0000	
   0,1855	
   X7	
  
	
   	
   	
   	
   	
   	
   	
   1,0000	
   X8	
  
	
  
	
  
	
  
We	
   can	
   see	
   from	
   	
   the	
   data	
   that	
   there	
   is	
   not	
   a	
   very	
   high	
   correlation,	
   but	
   we	
   can	
   run	
   a	
   factor	
  
analysis	
   since	
   there	
   are	
   some	
   correlations.	
   Using	
   R	
   we	
   have	
   found	
   this	
   values	
   that	
   refers	
   to	
  
correlation	
  coefficient	
  of	
  Pearson.	
  So	
  we	
  can	
  conclude	
  that	
  there	
  is	
  a	
  strong	
  correlation	
  between	
  
X4-­‐X1	
  and	
  there	
  is	
  a	
  moderate	
  correlation	
  among	
  X1	
  and	
  X6-­‐X3-­‐X2,	
  between	
  X2-­‐X6,	
  between	
  X3	
  
and	
  X6-­‐X4	
  and	
  finally	
  between	
  X4-­‐X5.	
  
We	
  have	
  considered	
  a	
  strong	
  correlation	
  if	
  	
  corr	
  >	
  0.7	
  and	
  moderate	
  correlation	
  if	
  0.3	
  <	
  corr	
  <	
  0.7.	
  
	
  
	
  
	
  
	
  
	
  
	
  
	
  
	
  
	
  
	
  
	
  
6	
  
	
  
REGRESSION	
  MODEL	
  
	
  
Model	
  1:	
  OLS,	
  number	
  of	
  observations	
  	
  1-­‐20	
  
Dependent	
  variable:	
  Life	
  expectancy	
  at	
  birth.	
  
	
  
	
  	
   Coefficient	
   	
  Std.	
  Error	
   t-­‐ratio	
   p-­‐value	
   	
  
Constant.	
   88,4781	
   8,19707	
   10,7939	
   <0,00001	
   ***	
  
	
  
	
  
	
   	
   	
   	
   	
  
CO2	
  emissions	
  kg	
  
per	
  2000	
  US$	
  of	
  
GDP	
  .	
  
	
  
-­‐3,18062	
   1,18728	
   -­‐2,6789	
   0,02008	
   **	
  
Urban	
  population.	
   -­‐1,19832e-­‐08	
   8,08775e-­‐09	
   -­‐1,4817	
   0,16421	
  
	
  
	
  
	
  
	
  
Health	
  
expenditure	
  per	
  
capita.	
  
0,00106495	
   0,000551237	
   1,9319	
   0,07732	
   *	
  
	
   	
   	
   	
   	
   	
  
Unemployment	
  
total.	
  
-­‐0,903724	
   0,206679	
   -­‐4,3726	
   0,00091	
   ***	
  
	
   	
   	
   	
   	
   	
  
Public	
  spending	
  on	
  
education.	
  
-­‐1,75829	
   1,13982	
   -­‐1,5426	
   0,14888	
   	
  
	
  
	
  
Subsidies	
  and	
  
other	
  transfers.	
  
0,0396108	
   0,0953704	
   0,4153	
   0,68523	
  
	
  
	
  
	
  
Terrestrial	
  
protected	
  areas.	
  
0,026664	
   0,0893965	
   0,2983	
   0,77060	
  
	
  
	
  
	
  
	
  
	
   	
   	
   	
   	
  	
  
	
   	
   	
   	
   	
  	
  
R-­‐squared	
   	
  0,865092	
   	
   R	
  (adjusted)	
   	
  0,786395	
  
	
   	
  	
   	
   P-­‐value(F)	
   	
  0,000221	
  
	
  
	
  
	
  
With	
   the	
   software	
   Gretl	
   	
   we	
   have	
   run	
   a	
   regression	
   of	
   our	
   data	
   using	
   OLS	
   regression	
   method.	
  
Analyzing	
  R-­‐squared	
  we	
  can	
  conclude	
  that	
  the	
  model	
  as	
  a	
  whole	
  is	
  very	
  good.	
  Also	
  P-­‐value(F)	
  is	
  
very	
   low	
   so	
   it	
   means	
   that	
   the	
   model	
   as	
   a	
   whole	
   is	
   very	
   significant	
   for	
   any	
   value	
   of	
   α.	
   The	
  
dependent	
   variable	
   is	
   “life	
   expectancy	
   at	
   birth”	
   and	
   the	
   others	
   are	
   independent	
   variables.	
   The	
  
7	
  
	
  
independent	
  variables	
  that	
  have	
  a	
  significant	
  p-­‐value	
  are:	
  CO2	
  emissions,	
  health	
  expenditure	
  per	
  
capita	
  and	
  unemployment.	
  
Since	
  p-­‐value	
  is	
  smaller	
  than	
  0.05,	
  we	
  reject	
  the	
  null	
  hypothesis	
  and	
  we	
  affirm	
  that	
  the	
  regressor	
  
CO2	
  emissions	
  has	
  a	
  significant	
  impact	
  on	
  life	
  expectancy	
  at	
  birth	
  at	
  level	
  5%..	
  
Since	
  p-­‐value	
  is	
  smaller	
  than	
  0.1,	
  we	
  reject	
  the	
  null	
  hypothesis	
  and	
  we	
  affirm	
  that	
  the	
  regressor	
  	
  
health	
  expenditure	
  per	
  capita	
  has	
  a	
  significant	
  impact	
  on	
  life	
  expectancy	
  at	
  birth	
  at	
  level	
  10%..	
  
Finally	
  since	
  p-­‐value	
  is	
  smaller	
  than	
  0.01,	
  we	
  reject	
  the	
  null	
  hypothesis	
  and	
  we	
  affirm	
  that	
  the	
  
regressor	
  unemployment	
  total	
  has	
  a	
  significant	
  impact	
  on	
  life	
  expectancy	
  at	
  birth	
  at	
  level	
  1%.	
  
So	
  we	
  can	
  conclude	
  that	
  if	
  CO2	
  emissions	
  increase	
  	
  of	
  1	
  Kg	
  per	
  2000	
  US$	
  of	
  GDP,	
  life	
  expectancy	
  at	
  
birth	
  will	
  reduce	
  of	
  3,18062	
  years.	
  
Another	
   conclusion	
   is	
   that	
   if	
   health	
   expenditure	
   per	
   capita	
   increases	
   of	
   1	
   current	
   US$,	
   life	
  
expectancy	
  at	
  birth	
  	
  will	
  increase	
  of	
  0,00106495	
  years.	
  Finally	
  if	
  unemployment	
  total	
  will	
  increase	
  
of	
  1%	
  	
  life	
  expectancy	
  	
  at	
  birth	
  will	
  reduce	
  of	
  -­‐0,903724	
  years.	
  
	
  
	
  
FACTOR	
  ANALYSIS	
  
	
  
In	
  order	
  to	
  run	
  a	
  factor	
  analysis	
  we	
  applied	
  the	
  “Principal	
  component	
  method”	
  	
  by	
  using	
  R.	
  So	
  we	
  
found	
   these	
   data	
   of	
   eigenvalues,	
   portion	
   of	
   variance(total)	
   and	
   cumulative	
   proportion	
   of	
  
variance(total).	
  
	
  
	
  
	
  
	
  
	
  
	
  
	
  
	
  
	
  
	
  
	
  
	
  
	
  
	
  
Eigenvalues	
   Portion	
  of	
  variance	
  
(total)	
  
Cumulative	
  
proportion	
  of	
  
variance(total)	
  
3.13602447	
   0.3920031	
   0.3920031	
  
1.59218446	
   0.1990231	
   0.5910261	
  
1.06125308	
   0.1326566	
   0.7236828	
  
0.88797144	
   0.1109964	
   0.8346792	
  
0.55766918	
   0.06970865	
   0.90438783	
  
0.48900580	
   0.06112573	
   0.96551355	
  
0.19844296	
   0.02480537	
   0.99031892	
  
0.07744861	
   0.009681076	
   1.000000000	
  
8	
  
	
  
To	
  select	
  how	
  many	
  factors	
  to	
  use	
  we	
  considered	
  eigenvalues>	
  1	
  applying	
  “kaiser	
  criterium”,	
  so	
  
we	
  dropped	
  all	
  components	
  with	
  eigenvalues	
  under	
  1.	
  
Eigenvalue≅equivalent	
  number	
  of	
  variables	
  which	
  the	
  factor	
  represents.	
  
Looking	
  at	
  the	
  table	
  we	
  can	
  see	
  that	
  with	
  3	
  eigenvalues,	
  the	
  factor	
  model	
  will	
  explain	
  72.37%	
  of	
  
total	
  original	
  variability.	
  
	
  
	
  
	
  
SCREE	
  PLOT	
  
We	
   can	
   see	
   also	
   the	
   results	
   from	
   another	
  
point	
  of	
  view	
  thanks	
  to	
  the	
  scree	
  plot.	
  This	
  
test	
  puts	
  the	
  components	
  in	
  the	
  X	
  axis	
  and	
  
the	
  corresponding	
  eigenvalues	
  in	
  the	
  Y-­‐axis.	
  
	
  
	
  
	
  
	
  
The	
   factor	
   loading	
   lij	
   is	
   the	
   covariance	
   between	
   the	
   j-­‐th	
   common	
   factor	
   and	
   the	
   i-­‐th	
   original	
  
variable.	
  But	
  the	
  chosen	
  variables	
  are	
  standardized	
  so	
  it	
  coincides	
  with	
  the	
  correlation	
  between	
  
the	
  j-­‐th	
  common	
  factor	
  and	
  the	
  i-­‐th	
  original	
  variable.	
  In	
  these	
  case	
  the	
  minimum	
  value	
  is	
  -­‐1	
  (in	
  
case	
   of	
   perfect	
   negative	
   correlation)	
   and	
   the	
   maximum	
   value	
   is	
   1	
   (in	
   case	
   of	
   perfect	
   positive	
  
correlation).	
  
	
  
	
  
	
  
	
  
	
  
	
  
	
  
	
  
	
  
	
  
	
  
	
  
	
  
	
  
	
  
Comp.1 Comp.3 Comp.5 Comp.7
.PC
Variances
0.00.51.01.52.02.53.0
9	
  
	
  
	
  
	
  
VARIANCE	
  EXPLAINED	
  BY	
  EACH	
  FACTOR	
  
FACTOR	
  1	
   FACTOR	
  2	
   FACTOR	
  3	
  
30.11%	
   22.34%	
   8.9%	
  
	
  
	
  
The	
  portion	
  of	
  total	
  variability	
  explained	
  by	
  the	
  first	
  factor	
  is	
  2.409/8=30.11%	
  (ss	
  loading/sum	
  of	
  
total	
  variance).	
  The	
  portion	
  of	
  total	
  variability	
  explained	
  by	
  the	
  second	
  factor	
  is	
  1.787/8=22.34%.	
  
The	
  portion	
  of	
  total	
  variability	
  explained	
  by	
  the	
  third	
  factor	
  is	
  0.712/8=8.9%.	
  The	
  total	
  variance	
  
explained	
  by	
  the	
  model	
  is	
  61.35%,	
  which	
  indicates	
  that	
  the	
  model	
  is	
  quite	
  good.	
  
	
  
	
  
FACTOR	
  LOADING	
  MATRIX	
  
	
   Factor	
  1	
   Factor	
  2	
   Factor	
  3	
  
CO2.emissions	
  (	
  X1)	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
   -­‐0.596	
   -­‐0.349	
   -­‐0.460	
  
Health	
  expenditure	
  per	
  capita	
  (	
  X2)	
   0.532	
   0.430	
   0.334	
  
Life	
  expectancy	
  at	
  birth	
  (	
  X3)	
  	
   0.923	
   	
   0.376	
  
Public	
  spending.on	
  education	
  (	
  X4)	
  	
   0.246	
   0.955	
   -­‐0.148	
  
Subsidies	
  and	
  other	
  transfers	
  of	
  expense	
  (	
  
X5)	
  	
  
0.188	
   	
   0.122	
  
Terrestrial	
  protected	
  areas	
  	
  (	
  X6)	
  	
   0.237	
   0.216	
   	
  
Unemployment	
  (	
  X7)	
   -­‐0.869	
   0.325	
   0.365	
  
Urban	
  population	
  (	
  X8)	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
   -­‐0.106	
   -­‐0.640	
   -­‐0.274	
  
	
  
SS	
  loadings	
  	
  	
   2.409	
  	
  	
  	
   1.787	
  	
  	
  	
   0.712	
  
Proportion	
  Var	
  	
  	
  	
   0.301	
  	
  	
  	
   0.223	
  	
  	
  	
   0.089	
  
Cumulative	
  Var	
  	
  	
  	
   0.301	
  	
  	
  	
   0.525	
  	
  	
  	
   0.614	
  
10	
  
	
  
	
  
FINAL	
  ESTIMATION	
  OF	
  THE	
  COMMUNALITIES	
  
	
  	
   communalities	
   Specific	
  variance	
  
CO2.emissions	
  (	
  X1)	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
   0,689	
   0,311	
  
Health	
  expenditure	
  per	
  capita	
  (	
  X2)	
   0,58	
   0,42	
  
Life	
  expectancy	
  at	
  birth	
  (	
  X3)	
  	
   0,995	
   0,005	
  
Public	
  spending	
  on	
  education	
  (	
  X4)	
  	
   0,995	
   0,005	
  
Subsidies	
  and	
  other	
  transfers	
  of	
  expense	
  (X5	
  )	
  	
   0,0054	
   0,946	
  
Terrestrial	
  protected	
  areas	
  	
  (	
  X6)	
  	
   0,105	
   0,895	
  
Unemployment	
  (	
  X7)	
   0,995	
   0,005	
  
Urban	
  population	
  (	
  X8)	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
   0,496	
   0,504	
  
Total	
   4,8604	
   	
  
	
  
	
  
By	
  the	
  final	
  estimation	
  of	
  the	
  communalities	
  we	
  can	
  see	
  that	
  there	
  are	
  5	
  communalities	
  that	
  well	
  
explain	
  the	
  model	
  	
  because	
  higher	
  than	
  50%	
  (these	
  communalities	
  refers	
  to	
  variables:	
  X1	
  ,	
  X2,	
  	
  X3,	
  	
  
X4,	
   X7).	
   There	
   are	
   also	
   3	
   communalities	
   that	
   don’t	
   explain	
   the	
   model	
   very	
   well	
   	
   (these	
  
communalities	
  refers	
  to	
  variables	
  X5,	
  	
  X6,	
  	
  X8)	
  .	
  	
  
In	
  fact	
  variables	
  with	
  high	
  communality	
  share	
  more	
  in	
  common	
  with	
  the	
  rest	
  of	
  the	
  variables.	
  
Indeed	
  specific	
  variance	
  for	
  each	
  observed	
  variable	
  is	
  that	
  portion	
  of	
  the	
  variable	
  that	
  cannot	
  be	
  
predicted	
  from	
  the	
  other	
  variables.	
  
So	
  we	
  decided	
  that	
  after	
  ,in	
  naming	
  factors,	
  we	
  will	
  not	
  consider	
  X5,	
  X6.	
  But	
  given	
  that	
  X8	
  has	
  a	
  
communality	
  very	
  near	
  to	
  50%	
  we	
  can	
  consider	
  this	
  variable.	
  
	
  
	
  
	
  
	
  
	
  
11	
  
	
  
	
  
	
  
Now	
  we	
  can	
  improve	
  the	
  interpretation	
  of	
  a	
  the	
  factors	
  by	
  applying	
  a	
  rotation	
  to	
  the	
  factor	
  loading	
  
matrix.	
  
	
  
	
  ROTATED	
  VARIANCE	
  EXPLAINED	
  BY	
  EACH	
  FACTOR	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  (Total=61.36%)	
  
FACTOR	
  1	
   FACTOR	
  2	
   FACTOR	
  3	
  
26.02%	
   19.9%	
   15.44%	
  
	
  
	
  ROTATED	
  FACTOR	
  LOADING	
  MATRIX	
  	
  (	
  varimax)	
  
	
   Factor	
  1	
   Factor	
  2	
   Factor	
  3	
  
CO2.emissions	
  (	
  X1)	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
   -­‐0.772	
   -­‐0.301	
   	
  
Health	
  expenditure	
  per	
  capita	
  (	
  X2)	
   0.645	
   0.402	
   	
  
Life	
  expectancy	
  at	
  birth	
  (	
  X3)	
  	
   0.890	
   0.101	
   -­‐0.439	
  
Public	
  spending.on	
  education	
  (	
  X4)	
  	
   0.154	
   0.984	
   	
  
Subsidies	
  and	
  other	
  transfers	
  of	
  expense	
  (	
  
X5)	
  	
  
0.221	
   	
   	
  
Terrestrial	
  protected	
  areas	
  	
  (	
  X6)	
  	
   0.143	
   0.260	
   -­‐0.129	
  
Unemployment	
  (	
  X7)	
   -­‐0.260	
   	
   0.962	
  
Urban	
  population	
  (	
  X8)	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
   -­‐0.343	
   -­‐0.537	
   -­‐0.300	
  
	
  
SS	
  loadings	
  	
  	
   2.082	
   1.592	
   1.235	
  
Proportion	
  Var	
  	
  	
  	
   0.260	
  
	
  
0.199	
   0.154	
  
Cumulative	
  Var	
  	
  	
  	
   0.260	
   0.459	
   0.614	
  
12	
  
	
  
It	
  is	
  clear	
  that	
  with	
  the	
  rotation	
  now	
  the	
  variance	
  explained	
  by	
  each	
  factor	
  is	
  well	
  distributed	
  and	
  
mostable	
  factor	
  3	
  passes	
  from	
  8.9%	
  to	
  15.44%.	
  
Furthermore	
  we	
  want	
  to	
  assign	
  a	
  label	
  to	
  each	
  factor	
  considering	
  the	
  more	
  significant	
  	
  variables.	
  	
  
In	
   naming	
   the	
   label	
   of	
   latent	
   variables	
   we	
   have	
   considered	
   more	
   the	
   original	
   variables	
   with	
  
communality>50%.	
   First	
   factor	
   is	
   mainly	
   explained	
   by	
   CO2	
   emissions,	
   	
   health	
   expenditure	
   	
   per	
  
capita,	
  	
  life	
  expectancy	
  	
  at	
  birth	
  unemployment.	
  We	
  have	
  not	
  considered	
  	
  subsidies	
  and	
  other	
  
transfers	
  of	
  expense	
  and	
  terrestrial	
  protected	
  areas	
  	
  because	
  they	
  have	
  communality<50%.	
  
Second	
  factor	
  is	
  mainly	
  explained	
  by	
  	
  public	
  spending	
  on	
  education	
  and	
  urban	
  population	
  but	
  only	
  
the	
  first	
  has	
  a	
  communality>50%.	
  
The	
  third	
  factor	
  is	
  explained	
  by	
  unemployment.	
  
In	
  principal	
  components,	
  the	
  first	
  factor	
  describes	
  most	
  of	
  variability.	
  
After	
   choosing	
   number	
   of	
   factors	
   to	
   retain,	
   we	
   want	
   to	
   spread	
   variability	
   among	
   factors	
   to	
  
improve	
  the	
  interpretation.	
  So	
  we	
  consider	
  “rotated	
  factors”	
  that	
  have	
  a	
  better	
  distinction	
  in	
  the	
  
meanings	
  of	
  the	
  factor.	
  
	
  
	
  
	
  
	
   NEW	
  LATENT	
  VARIABLES	
   ORIGINAL	
  VARIABLES	
  
FACTOR	
  1	
   	
  
WELFARE	
  AND	
  WELL-­‐BEING	
  
CO2.emissions	
  (	
  X1)	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  
Health	
  expenditure	
  per	
  capita	
  (	
  X2)	
  
Life	
  expectancy	
  at	
  birth	
  (	
  X3)	
  
Subsidies	
  and	
  other	
  transfers	
  of	
  expense	
  (	
  
X5)	
  
FACTOR2	
   PUBLIC	
  INTERVENTION	
  ON	
  
POPULATION	
  
Public	
  spending	
  on	
  education	
  (	
  X4)	
  
Terrestrial	
  protected	
  areas	
  	
  (	
  X6)	
  
Urban	
  population	
  (	
  X8)	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  
FACTOR3	
   UNEMPLYMENT	
   Unemployment	
  (	
  X7)	
  
	
  
	
  
	
  
	
  
	
  
13	
  
	
  
	
  
	
  
CLUSTER	
  ANALYSIS	
  
Now	
  we	
  want	
  to	
  analyze	
  how	
  we	
  can	
  cluster	
  the	
  countries	
  using	
  the	
  observations	
  of	
  real	
  variable	
  
in	
  order	
  to	
  get	
  few	
  homogenous	
  groups.	
  
We	
  	
  compared	
  two	
  methods	
  of	
  clustering:	
  
1.	
  hierarchical	
  method,	
  using	
  Euclidean	
  distance	
  and	
  the	
  ward	
  method;	
  
2.	
  hierarchical	
  method,	
  using	
  Euclidean	
  distance	
  and	
  the	
  complete	
  linkage	
  method.	
  
	
  
This	
  is	
  the	
  legend	
  of	
  countries:	
  
1. Canada	
  
2. France	
  
3. Germany	
  
4. Japan	
  
5. Italy	
  
6. RussianFederation	
  
7. United	
  States	
  
8. United	
  Kingdom	
  
9. Brazil	
  
10. China	
  
11. India	
  
12. South	
  Africa	
  
13. Australia	
  
14. Saudi	
  Arabia	
  
15. Korea,	
  Rep.	
  
16. Indonesia	
  
17. Mexico	
  
18. Turkey	
  
19. Spain	
  
20. Netherlands	
  
14	
  
	
  
	
  
	
  
With R Software we have run an analysis to choose the number of clusters basing on the within
sum of squares computation. From this graph we see that we could have four clusters after cluster
analysis.	
  
	
  
15	
  
	
  
In	
  this	
  cluster	
  analysis	
  we	
  have	
  used	
  the	
  ward	
  method	
  with	
  the	
  Euclidian	
  distance.	
  The	
  ward	
  
method	
  is	
  a	
  non-­‐hierarchical	
  method	
  based	
  on	
  the	
  ANOVA	
  approach.	
  Where	
  ANOVA	
  stands	
  for	
  
ANalysis	
  Of	
  VAriance	
  table.	
  
The graph suggests us that we can use 3 clusters because we can consider China like an isolated
country because has very few in common with other clusters.
Cluster 1: Usa, India. (7-11)
Cluster 2: Brazil, Mexico, Russia, Japan, Indonesia. (9-17-6-4-16)
Cluster 3: Canada, France, Germany, Italy, United Kingdom, South Africa, Australia, Saudi Arabia,
South Korea, Turkey, Spain, Netherland.(1-12-20-13-14-19-5-15-8-18-2-3)
	
  
	
  
16	
  
	
  
	
  
These are the means for each variable:	
  
	
   Cluster1	
   Cluster2	
   Cluster3	
  
X1=CO2	
  emissions	
  (kg	
  per	
  2000	
  
US$	
  of	
  GDP)	
  
7.584599e-­‐01	
   1.401193e+00	
   1.765399e+00	
  
X2=Urban	
  population	
   3.652584e+07	
   1.153590e+08	
   4.082957e+08	
  
	
  
X3=Health	
  expenditure	
  per	
  capita	
  
(current	
  US$)	
  
3.652584e+07	
   1.036108e+03	
   2.639784e+03	
  
X4=Life	
  expectancy	
  at	
  birth,	
  total	
  
(years)	
  
7.691514e+01	
   7.343499e+01	
   7.173244e+01	
  
	
  
X5=Unemployment,	
  total	
  (%	
  of	
  
total	
  labor	
  force)	
  
7.783333e+00	
   5.860000e+00	
   4.833333e+00	
  
X6=Public	
  spending	
  on	
  education,	
  
total	
  (%	
  of	
  GDP)	
  
4.815916e+00	
   4.147186e+00	
   3.538987e+00	
  
X7=Subsidies	
  and	
  other	
  transfers	
  
(%	
  of	
  expense)	
  
6.459847e+01	
   6.140823e+01	
   6.176835e+01	
  
X8=Terrestrial	
  protected	
  areas	
  (%	
  
of	
  total	
  land	
  area)	
  
1.513201e+01	
   1.538366e+01	
  
	
  
1.134538e+01	
  
	
  
	
  
	
  
The	
  cluster	
  1	
  is	
  that	
  one	
  represents	
  more	
  variables.	
  It	
  is	
  composed	
  only	
  by	
  Usa	
  and	
  India.	
  This	
  
cluster	
  seems	
  to	
  have	
  	
  higher	
  values	
  in	
  health	
  expenditure,	
  life	
  expectancy,	
  unemployment,	
  public	
  
spending	
  on	
  education	
  and	
  subsidies.	
  
The	
  second	
  cluster	
  is	
  that	
  one	
  with	
  more	
  terrestrial	
  protected	
  areas.	
  
Finally	
  the	
  third	
  cluster	
  has	
  the	
  higher	
  co2	
  emissions	
  and	
  urban	
  population,	
  but	
  we	
  can	
  see	
  also	
  
that	
  is	
  the	
  cluster	
  formed	
  by	
  the	
  majority	
  of	
  elements.	
  
	
  
17	
  
	
  
	
  
	
  
10
1
12
20
13
14
19
5
15
3
2
8
18
9
17
6
4
16
7
11
0e+001e+082e+083e+084e+085e+08
Cluster Dendrogram for Solution HClust.10
Method=average; Distance=euclidian
Observation Number in Data Set Dataset
Height
	
  
	
  
This	
  cluster	
  analysis	
  with	
  average	
  method	
  and	
  Euclidian	
  distance	
  give	
  us	
  a	
  result	
  worse	
  than	
  the	
  
previous	
  analysis.	
  Now	
  we	
  have	
  10(China)	
  that	
  is	
  an	
  outlier	
  and	
  7	
  and	
  11(U.S.	
  and	
  India)	
  that	
  are	
  
far	
  different	
  from	
  other	
  two	
  clusters.	
  
	
  
	
  
18	
  
	
  
	
   	
  
	
  
	
  
Without	
  7	
  9	
  10	
  11(U.S.	
  Brazil,	
  China,	
  India),	
  we	
  obtain	
  a	
  better	
  cluster	
  analysis	
  without	
  outlier.	
  
Now	
  we	
  have	
  two	
  clusters,	
  the	
  first	
  composed	
  by Canada, France, Germany, Italy, United
Kingdom, South Africa, Australia, Saudi Arabia, South Korea, Turkey, Spain, Netherland.(1-12-20-
13-14-19-5-15-8-18-2-3). The second is composed by: Mexico, Russia, Japan, Indonesia. (17-6-4-
16) .
19	
  
	
  
	
  
CONCLUSION	
  
	
  
The	
  initial	
  aim	
  of	
  this	
  research	
  was	
  to	
  find	
  a	
  possible	
  relationship	
  between	
  countries	
  belonging	
  to	
  
G20.	
  After	
  cluster	
  and	
  factor	
  analysis	
  we	
  can	
  say	
  that	
  the	
  results	
  obtained	
  are	
  quite	
  interesting	
  
since	
  the	
  factor	
  analysis	
  suggests	
  us	
  3	
  new	
  latent	
  variables	
  that	
  summarize	
  the	
  original	
  ones.	
  	
  We	
  
passed	
  from	
  11	
  original	
  variables	
  to	
  3	
  variables.	
  
The	
  factor	
  analysis	
  produced	
  a	
  quite	
  satisfactory	
  result.	
  We	
  have	
  now	
  three	
  groups:	
  “welfare	
  and	
  
well-­‐being”,	
  “public	
  intervention”	
  and	
  “unemplyment”.	
  
Also	
   cluster	
   analysis	
   produced	
   a	
   satisfactory	
   result.	
   We	
   can	
   find	
   some	
   common	
   characteristics	
  
among	
   clusters.	
   We	
   can	
   note	
   that	
   cluster 2: Brazil, Mexico, Russia, Japan, Indonesia is
characterized by countries with an high population and apart Japan they are all developing
countries.	
  
Cluster 3 Canada, France, Germany, Italy, United Kingdom, South Africa, Australia, Saudi Arabia,
South Korea, Turkey, Spain, Netherland is the cluster with all the European country that means is
the cluster with the higher welfare and equality of people inside clusters. We can also note that there
is the highest urban population but also the highest CO2 emissions.
It could be more difficult to discuss cluster 1 because is formed by 2 different countries. One the
U.S. is characterized by richness and is developed. Indeed India as a majority of poor population
and is a developing country. But we can also find some common points that could be public
spending on education because both India and U.S. have a good system of education.
	
  
	
  
	
  

More Related Content

Viewers also liked

Innovation & Entreprenurship BI seminar
Innovation & Entreprenurship BI seminarInnovation & Entreprenurship BI seminar
Innovation & Entreprenurship BI seminarEngage // Innovate
 
Micro economics introduction
Micro economics introductionMicro economics introduction
Micro economics introductionLN college
 
MICRO ECONOMICS-CHAPTER-1
MICRO ECONOMICS-CHAPTER-1MICRO ECONOMICS-CHAPTER-1
MICRO ECONOMICS-CHAPTER-1Mahofuzur Masum
 
ECONOMICS DEMAND PPT @ MBA 2009.ppt
ECONOMICS DEMAND PPT @ MBA 2009.pptECONOMICS DEMAND PPT @ MBA 2009.ppt
ECONOMICS DEMAND PPT @ MBA 2009.pptBabasab Patil
 
Ch. 1 micro and macro economics
Ch. 1 micro and macro economicsCh. 1 micro and macro economics
Ch. 1 micro and macro economicsManish Purani
 
Sales promotion: basic sales promotion techniques
Sales promotion: basic sales promotion techniquesSales promotion: basic sales promotion techniques
Sales promotion: basic sales promotion techniquesAtanas Luizov
 
Advertising and Sales promotion
Advertising and Sales promotionAdvertising and Sales promotion
Advertising and Sales promotionRah Mon
 
Tools of sales promotion
Tools of sales promotionTools of sales promotion
Tools of sales promotionsumkrishna
 
Nature and scope of economics
Nature and scope of economicsNature and scope of economics
Nature and scope of economicsArihantJain21
 
Entrepreneurship Lecture Notes Part 1
Entrepreneurship Lecture Notes Part 1Entrepreneurship Lecture Notes Part 1
Entrepreneurship Lecture Notes Part 1Odofin Caleb
 
DEMAND ,TYPES AND IT'S FUNCTIONS
DEMAND ,TYPES AND IT'S FUNCTIONSDEMAND ,TYPES AND IT'S FUNCTIONS
DEMAND ,TYPES AND IT'S FUNCTIONSMayank Lobhane
 

Viewers also liked (13)

Innovation & Entreprenurship BI seminar
Innovation & Entreprenurship BI seminarInnovation & Entreprenurship BI seminar
Innovation & Entreprenurship BI seminar
 
Entreprenurship
EntreprenurshipEntreprenurship
Entreprenurship
 
Micro economics introduction
Micro economics introductionMicro economics introduction
Micro economics introduction
 
MICRO ECONOMICS-CHAPTER-1
MICRO ECONOMICS-CHAPTER-1MICRO ECONOMICS-CHAPTER-1
MICRO ECONOMICS-CHAPTER-1
 
ECONOMICS DEMAND PPT @ MBA 2009.ppt
ECONOMICS DEMAND PPT @ MBA 2009.pptECONOMICS DEMAND PPT @ MBA 2009.ppt
ECONOMICS DEMAND PPT @ MBA 2009.ppt
 
Ch. 1 micro and macro economics
Ch. 1 micro and macro economicsCh. 1 micro and macro economics
Ch. 1 micro and macro economics
 
Sales promotion: basic sales promotion techniques
Sales promotion: basic sales promotion techniquesSales promotion: basic sales promotion techniques
Sales promotion: basic sales promotion techniques
 
Advertising and Sales promotion
Advertising and Sales promotionAdvertising and Sales promotion
Advertising and Sales promotion
 
Tools of sales promotion
Tools of sales promotionTools of sales promotion
Tools of sales promotion
 
Nature and scope of economics
Nature and scope of economicsNature and scope of economics
Nature and scope of economics
 
Entrepreneurship Lecture Notes Part 1
Entrepreneurship Lecture Notes Part 1Entrepreneurship Lecture Notes Part 1
Entrepreneurship Lecture Notes Part 1
 
DEMAND ,TYPES AND IT'S FUNCTIONS
DEMAND ,TYPES AND IT'S FUNCTIONSDEMAND ,TYPES AND IT'S FUNCTIONS
DEMAND ,TYPES AND IT'S FUNCTIONS
 
Sales Promotion
Sales Promotion Sales Promotion
Sales Promotion
 

Similar to Business statistcs

Beyond GDP: Measuring well-being and progress of Nations
Beyond GDP: Measuring well-being and progress of NationsBeyond GDP: Measuring well-being and progress of Nations
Beyond GDP: Measuring well-being and progress of NationsKübra Bayram
 
Aus Cities Essay
Aus Cities EssayAus Cities Essay
Aus Cities EssayKriti Gupta
 
An Empirical Study of the Environmental Kuznets Curve for Environment Quality...
An Empirical Study of the Environmental Kuznets Curve for Environment Quality...An Empirical Study of the Environmental Kuznets Curve for Environment Quality...
An Empirical Study of the Environmental Kuznets Curve for Environment Quality...ijceronline
 
Why it's time to leave GDP behind
Why it's time to leave GDP behindWhy it's time to leave GDP behind
Why it's time to leave GDP behindGaia Manco
 
sustainability_indicators,_indices_and_tools,_prarthana_gupta
sustainability_indicators,_indices_and_tools,_prarthana_guptasustainability_indicators,_indices_and_tools,_prarthana_gupta
sustainability_indicators,_indices_and_tools,_prarthana_guptaPrarthana Gupta
 
Final presentation (yw5178)
Final presentation (yw5178)Final presentation (yw5178)
Final presentation (yw5178)ssuser2d9f321
 
An Empirical Study on the Change of Consumption Level of Chinese Residents
An Empirical Study on the Change of Consumption Level of Chinese ResidentsAn Empirical Study on the Change of Consumption Level of Chinese Residents
An Empirical Study on the Change of Consumption Level of Chinese ResidentsDr. Amarjeet Singh
 
BenefitsOfShifFromCarToActiveTransport.pdfTransport Policy.docx
BenefitsOfShifFromCarToActiveTransport.pdfTransport Policy.docxBenefitsOfShifFromCarToActiveTransport.pdfTransport Policy.docx
BenefitsOfShifFromCarToActiveTransport.pdfTransport Policy.docxAASTHA76
 
econometrics report poverty
econometrics report povertyeconometrics report poverty
econometrics report povertyKyel Governor
 
An Application of Tobit Regression on Socio Economic Indicators in Gujarat
An Application of Tobit Regression on Socio Economic Indicators in GujaratAn Application of Tobit Regression on Socio Economic Indicators in Gujarat
An Application of Tobit Regression on Socio Economic Indicators in Gujaratijtsrd
 
Eva Neitzert: Regional Index of Sustainable Economic Wellbeing
Eva Neitzert: Regional Index of Sustainable Economic WellbeingEva Neitzert: Regional Index of Sustainable Economic Wellbeing
Eva Neitzert: Regional Index of Sustainable Economic WellbeingAndy Dunne
 
NERI Seminar - The Fiscal Implications of Demographic Change in the Health Se...
NERI Seminar - The Fiscal Implications of Demographic Change in the Health Se...NERI Seminar - The Fiscal Implications of Demographic Change in the Health Se...
NERI Seminar - The Fiscal Implications of Demographic Change in the Health Se...NevinInstitute
 
Sustainability of Madagascar: COMPAC Project 2015
Sustainability of Madagascar: COMPAC Project 2015Sustainability of Madagascar: COMPAC Project 2015
Sustainability of Madagascar: COMPAC Project 2015Matthew Welsh
 
Government at a Glance 2013, Country Fact Sheet: United Kingdom
Government at a Glance 2013, Country Fact Sheet: United KingdomGovernment at a Glance 2013, Country Fact Sheet: United Kingdom
Government at a Glance 2013, Country Fact Sheet: United KingdomOECD Governance
 

Similar to Business statistcs (20)

Beyond GDP: Measuring well-being and progress of Nations
Beyond GDP: Measuring well-being and progress of NationsBeyond GDP: Measuring well-being and progress of Nations
Beyond GDP: Measuring well-being and progress of Nations
 
Aus Cities Essay
Aus Cities EssayAus Cities Essay
Aus Cities Essay
 
An Empirical Study of the Environmental Kuznets Curve for Environment Quality...
An Empirical Study of the Environmental Kuznets Curve for Environment Quality...An Empirical Study of the Environmental Kuznets Curve for Environment Quality...
An Empirical Study of the Environmental Kuznets Curve for Environment Quality...
 
Why it's time to leave GDP behind
Why it's time to leave GDP behindWhy it's time to leave GDP behind
Why it's time to leave GDP behind
 
41768
4176841768
41768
 
sustainability_indicators,_indices_and_tools,_prarthana_gupta
sustainability_indicators,_indices_and_tools,_prarthana_guptasustainability_indicators,_indices_and_tools,_prarthana_gupta
sustainability_indicators,_indices_and_tools,_prarthana_gupta
 
Final presentation (yw5178)
Final presentation (yw5178)Final presentation (yw5178)
Final presentation (yw5178)
 
An Empirical Study on the Change of Consumption Level of Chinese Residents
An Empirical Study on the Change of Consumption Level of Chinese ResidentsAn Empirical Study on the Change of Consumption Level of Chinese Residents
An Empirical Study on the Change of Consumption Level of Chinese Residents
 
Sdg3 slides
Sdg3 slidesSdg3 slides
Sdg3 slides
 
Sdg3 slides
Sdg3 slidesSdg3 slides
Sdg3 slides
 
Dinámica del Gasto en Salud
Dinámica del Gasto en SaludDinámica del Gasto en Salud
Dinámica del Gasto en Salud
 
BenefitsOfShifFromCarToActiveTransport.pdfTransport Policy.docx
BenefitsOfShifFromCarToActiveTransport.pdfTransport Policy.docxBenefitsOfShifFromCarToActiveTransport.pdfTransport Policy.docx
BenefitsOfShifFromCarToActiveTransport.pdfTransport Policy.docx
 
econometrics report poverty
econometrics report povertyeconometrics report poverty
econometrics report poverty
 
An Application of Tobit Regression on Socio Economic Indicators in Gujarat
An Application of Tobit Regression on Socio Economic Indicators in GujaratAn Application of Tobit Regression on Socio Economic Indicators in Gujarat
An Application of Tobit Regression on Socio Economic Indicators in Gujarat
 
Eva Neitzert: Regional Index of Sustainable Economic Wellbeing
Eva Neitzert: Regional Index of Sustainable Economic WellbeingEva Neitzert: Regional Index of Sustainable Economic Wellbeing
Eva Neitzert: Regional Index of Sustainable Economic Wellbeing
 
NERI Seminar - The Fiscal Implications of Demographic Change in the Health Se...
NERI Seminar - The Fiscal Implications of Demographic Change in the Health Se...NERI Seminar - The Fiscal Implications of Demographic Change in the Health Se...
NERI Seminar - The Fiscal Implications of Demographic Change in the Health Se...
 
OECD and Progress - Beyond GDP
OECD and Progress - Beyond GDPOECD and Progress - Beyond GDP
OECD and Progress - Beyond GDP
 
Sustainability of Madagascar: COMPAC Project 2015
Sustainability of Madagascar: COMPAC Project 2015Sustainability of Madagascar: COMPAC Project 2015
Sustainability of Madagascar: COMPAC Project 2015
 
Government at a Glance 2013, Country Fact Sheet: United Kingdom
Government at a Glance 2013, Country Fact Sheet: United KingdomGovernment at a Glance 2013, Country Fact Sheet: United Kingdom
Government at a Glance 2013, Country Fact Sheet: United Kingdom
 
Well being in regions
Well being in regionsWell being in regions
Well being in regions
 

More from Matteo Biagini

Real estate market in Italy
Real estate market in ItalyReal estate market in Italy
Real estate market in ItalyMatteo Biagini
 
Marchelite business plan
Marchelite business plan Marchelite business plan
Marchelite business plan Matteo Biagini
 
Doing business in russia
Doing business in russiaDoing business in russia
Doing business in russiaMatteo Biagini
 
Doing business in italy
Doing business in italyDoing business in italy
Doing business in italyMatteo Biagini
 
Diagramma business plan
Diagramma business planDiagramma business plan
Diagramma business planMatteo Biagini
 
Bank pekao strategic analysis
Bank pekao strategic analysisBank pekao strategic analysis
Bank pekao strategic analysisMatteo Biagini
 

More from Matteo Biagini (11)

Real estate market in Italy
Real estate market in ItalyReal estate market in Italy
Real estate market in Italy
 
Polish presentation
Polish presentationPolish presentation
Polish presentation
 
Marchelite business plan
Marchelite business plan Marchelite business plan
Marchelite business plan
 
Doing business in russia
Doing business in russiaDoing business in russia
Doing business in russia
 
Doing business in italy
Doing business in italyDoing business in italy
Doing business in italy
 
Diagramma business plan
Diagramma business planDiagramma business plan
Diagramma business plan
 
Bcg matrix bank pekao
Bcg matrix bank pekaoBcg matrix bank pekao
Bcg matrix bank pekao
 
Bank pekao strategic analysis
Bank pekao strategic analysisBank pekao strategic analysis
Bank pekao strategic analysis
 
2020 strategy
2020 strategy 2020 strategy
2020 strategy
 
Business statistics
Business statistics Business statistics
Business statistics
 
Sephora case study
Sephora case studySephora case study
Sephora case study
 

Business statistcs

  • 1. 1     QUALITY OF LIFE IN G 20 COUNTRIES       Matteo  Biagini                  
  • 2. 2     Index: p.3 Introduction p.5 Correlation Matrix p.6 Regression model p.7 Factor Analysis p.14 Cluster Analysis p.19 Conclusion                                    
  • 3. 3     QUALITY  OF  LIFE  IN  G  20  COUNTRIES     INTRODUCTION   The  aim  of  this  research  is  to  investigate  how  the  quality  of  life  in  G20  countries  is  related  to  some   indicators  of    life  quality.   Considering  quality  of  life    we  refer  to  the  general  well-­‐being  of  individuals  and  societies.  The  term   is  used  in  a  wide  range  of  contexts,  including  the  fields  of  international  development,  healthcare,   and  politics.  Standard  indicators  of  the  quality  of  life  include  not  only  wealth  and  employment,  but   also  the  built  environment,  physical  and  mental  health,  education,  recreation  and  leisure  time,   and  social  belonging.   So  among  a  variety  of  indicators  we  have  chosen  8.   Life  expectancy  is  a  key  indicator  of  the  general  health  of  the  population.  Improvements  in  overall   life  expectancy  reflect  improvements  in  social  and  economic  conditions,  lifestyle,  access  to  health   services  and  medical  advances.  This  indicator  uses  estimated  life  expectancy  at  birth.   CO2   emissions   and   terrestrial   protected   areas   are   indicators   that   concern   how   natural   environment  supports  its  people,  economy  and  culture.  As  the  population  grows  and  economic   activity  increases,  more  demands  are  placed  on  the  natural  environment.  Environmental  issues   impact  on  economic  and  public  health  issues.  In  fact  another  indicator  that  we    have  chosen  is   health  expenditure  per  capita  that  is  very  related  with  previous  indicators.   Urban   population   refers   to   population   growth   and   change   in   cities   impact   on   the   relationships   people  have  with  others  and  their  sense  of  belonging  to  an  area.   The   concept   of   community   is   fundamental   to   people’s   overall   quality   of   life   and   sense   of   belonging.  In  fact  we  have  chosen  subsidies  and  other  transfers  like  an  indicator  of  quality  of  life   because   these   are   an   instrument   with   whom   government   reassign   wealth   among   people   of   a   country.       Public  expenditure  on    education  provides  an  insight  into  the  knowledge  and  skills  of  residents  and   how  they  can  apply  these  to  improve  their  quality  of  life.  Educational  achievement  is  essential  for   effective  participation  in  society.   The   last   indicator   is   unemployment:   a   reduction   of   this   indicator   helps   stimulate   further   opportunities  for  economic  growth  and  development  within  a  community  or  nation.  
  • 4. 4     The   considered   countries   (G20   countries   that   are   the   richest   one   in   the   world)   are:   Canada,   France,  Germany,  Japan,  Italy,  Russian  Federation,  United  States,  United  Kingdom,  Brazil,  China,   South  Africa,  Australia,  Saudi  Arabia,  South  Korea,  Indonesia,  Mexico,  Turkey,  Spain,  Netherlands.   The  source  of  data  is  the  World  data  Bank  in  the  section  of  World  Development  indicators(WDI).   The  year  chosen  to  extract  data  is  2008.   The  specific  software  used  on  this  project  are:   ·∙ Gretl(regression)   ·∙ R-­‐Project  (factor  and  cluster  analysis)   ·∙ Microsoft  Excel  (data  matrix  elaboration,  before  and  after  using  R)     We  have  numbered  X  from  1  to  8  in  relation  to  any  variable:     ·∙ X1=CO2  emissions  (kg  per  2000  US$  of  GDP)   ·∙ X2=Urban  population   ·∙ X3=Health  expenditure  per  capita  (current  US$)   ·∙ X4=Life  expectancy  at  birth,  total  (years)   ·∙ X5=Unemployment,  total  (%  of  total  labor  force)   ·∙ X6=Public  spending  on  education,  total  (%  of  GDP)   ·∙ X7=Subsidies  and  other  transfers  (%  of  expense)   ·∙ X8=Terrestrial  protected  areas  (%  of  total  land  area)      
  • 5. 5         Correlation  matrix     X1   X2   X3   X4   X5   X6   X7   X8     1,0000   0,4108   -­‐0,6168   -­‐0,7387   0,2370   -­‐0,4123   -­‐0,0290   -­‐0,2151   X1     1,0000   -­‐0,2571   -­‐0,2300   -­‐0,2166   -­‐0,5982   -­‐0,1159   -­‐0,0277   X2       1,0000   0,6361   -­‐0,2003   0,4932   0,3154   0,1806   X3         1,0000   -­‐0,6507   0,2132   0,2230   0,2105   X4           1,0000   0,0424   -­‐0,0984   -­‐0,1525   X5             1,0000   0,0872   0,2719   X6               1,0000   0,1855   X7                 1,0000   X8         We   can   see   from     the   data   that   there   is   not   a   very   high   correlation,   but   we   can   run   a   factor   analysis   since   there   are   some   correlations.   Using   R   we   have   found   this   values   that   refers   to   correlation  coefficient  of  Pearson.  So  we  can  conclude  that  there  is  a  strong  correlation  between   X4-­‐X1  and  there  is  a  moderate  correlation  among  X1  and  X6-­‐X3-­‐X2,  between  X2-­‐X6,  between  X3   and  X6-­‐X4  and  finally  between  X4-­‐X5.   We  have  considered  a  strong  correlation  if    corr  >  0.7  and  moderate  correlation  if  0.3  <  corr  <  0.7.                        
  • 6. 6     REGRESSION  MODEL     Model  1:  OLS,  number  of  observations    1-­‐20   Dependent  variable:  Life  expectancy  at  birth.         Coefficient    Std.  Error   t-­‐ratio   p-­‐value     Constant.   88,4781   8,19707   10,7939   <0,00001   ***                 CO2  emissions  kg   per  2000  US$  of   GDP  .     -­‐3,18062   1,18728   -­‐2,6789   0,02008   **   Urban  population.   -­‐1,19832e-­‐08   8,08775e-­‐09   -­‐1,4817   0,16421           Health   expenditure  per   capita.   0,00106495   0,000551237   1,9319   0,07732   *               Unemployment   total.   -­‐0,903724   0,206679   -­‐4,3726   0,00091   ***               Public  spending  on   education.   -­‐1,75829   1,13982   -­‐1,5426   0,14888         Subsidies  and   other  transfers.   0,0396108   0,0953704   0,4153   0,68523         Terrestrial   protected  areas.   0,026664   0,0893965   0,2983   0,77060                                   R-­‐squared    0,865092     R  (adjusted)    0,786395           P-­‐value(F)    0,000221         With   the   software   Gretl     we   have   run   a   regression   of   our   data   using   OLS   regression   method.   Analyzing  R-­‐squared  we  can  conclude  that  the  model  as  a  whole  is  very  good.  Also  P-­‐value(F)  is   very   low   so   it   means   that   the   model   as   a   whole   is   very   significant   for   any   value   of   α.   The   dependent   variable   is   “life   expectancy   at   birth”   and   the   others   are   independent   variables.   The  
  • 7. 7     independent  variables  that  have  a  significant  p-­‐value  are:  CO2  emissions,  health  expenditure  per   capita  and  unemployment.   Since  p-­‐value  is  smaller  than  0.05,  we  reject  the  null  hypothesis  and  we  affirm  that  the  regressor   CO2  emissions  has  a  significant  impact  on  life  expectancy  at  birth  at  level  5%..   Since  p-­‐value  is  smaller  than  0.1,  we  reject  the  null  hypothesis  and  we  affirm  that  the  regressor     health  expenditure  per  capita  has  a  significant  impact  on  life  expectancy  at  birth  at  level  10%..   Finally  since  p-­‐value  is  smaller  than  0.01,  we  reject  the  null  hypothesis  and  we  affirm  that  the   regressor  unemployment  total  has  a  significant  impact  on  life  expectancy  at  birth  at  level  1%.   So  we  can  conclude  that  if  CO2  emissions  increase    of  1  Kg  per  2000  US$  of  GDP,  life  expectancy  at   birth  will  reduce  of  3,18062  years.   Another   conclusion   is   that   if   health   expenditure   per   capita   increases   of   1   current   US$,   life   expectancy  at  birth    will  increase  of  0,00106495  years.  Finally  if  unemployment  total  will  increase   of  1%    life  expectancy    at  birth  will  reduce  of  -­‐0,903724  years.       FACTOR  ANALYSIS     In  order  to  run  a  factor  analysis  we  applied  the  “Principal  component  method”    by  using  R.  So  we   found   these   data   of   eigenvalues,   portion   of   variance(total)   and   cumulative   proportion   of   variance(total).                               Eigenvalues   Portion  of  variance   (total)   Cumulative   proportion  of   variance(total)   3.13602447   0.3920031   0.3920031   1.59218446   0.1990231   0.5910261   1.06125308   0.1326566   0.7236828   0.88797144   0.1109964   0.8346792   0.55766918   0.06970865   0.90438783   0.48900580   0.06112573   0.96551355   0.19844296   0.02480537   0.99031892   0.07744861   0.009681076   1.000000000  
  • 8. 8     To  select  how  many  factors  to  use  we  considered  eigenvalues>  1  applying  “kaiser  criterium”,  so   we  dropped  all  components  with  eigenvalues  under  1.   Eigenvalue≅equivalent  number  of  variables  which  the  factor  represents.   Looking  at  the  table  we  can  see  that  with  3  eigenvalues,  the  factor  model  will  explain  72.37%  of   total  original  variability.         SCREE  PLOT   We   can   see   also   the   results   from   another   point  of  view  thanks  to  the  scree  plot.  This   test  puts  the  components  in  the  X  axis  and   the  corresponding  eigenvalues  in  the  Y-­‐axis.           The   factor   loading   lij   is   the   covariance   between   the   j-­‐th   common   factor   and   the   i-­‐th   original   variable.  But  the  chosen  variables  are  standardized  so  it  coincides  with  the  correlation  between   the  j-­‐th  common  factor  and  the  i-­‐th  original  variable.  In  these  case  the  minimum  value  is  -­‐1  (in   case   of   perfect   negative   correlation)   and   the   maximum   value   is   1   (in   case   of   perfect   positive   correlation).                                 Comp.1 Comp.3 Comp.5 Comp.7 .PC Variances 0.00.51.01.52.02.53.0
  • 9. 9         VARIANCE  EXPLAINED  BY  EACH  FACTOR   FACTOR  1   FACTOR  2   FACTOR  3   30.11%   22.34%   8.9%       The  portion  of  total  variability  explained  by  the  first  factor  is  2.409/8=30.11%  (ss  loading/sum  of   total  variance).  The  portion  of  total  variability  explained  by  the  second  factor  is  1.787/8=22.34%.   The  portion  of  total  variability  explained  by  the  third  factor  is  0.712/8=8.9%.  The  total  variance   explained  by  the  model  is  61.35%,  which  indicates  that  the  model  is  quite  good.       FACTOR  LOADING  MATRIX     Factor  1   Factor  2   Factor  3   CO2.emissions  (  X1)                               -­‐0.596   -­‐0.349   -­‐0.460   Health  expenditure  per  capita  (  X2)   0.532   0.430   0.334   Life  expectancy  at  birth  (  X3)     0.923     0.376   Public  spending.on  education  (  X4)     0.246   0.955   -­‐0.148   Subsidies  and  other  transfers  of  expense  (   X5)     0.188     0.122   Terrestrial  protected  areas    (  X6)     0.237   0.216     Unemployment  (  X7)   -­‐0.869   0.325   0.365   Urban  population  (  X8)                                                           -­‐0.106   -­‐0.640   -­‐0.274     SS  loadings       2.409         1.787         0.712   Proportion  Var         0.301         0.223         0.089   Cumulative  Var         0.301         0.525         0.614  
  • 10. 10       FINAL  ESTIMATION  OF  THE  COMMUNALITIES       communalities   Specific  variance   CO2.emissions  (  X1)                               0,689   0,311   Health  expenditure  per  capita  (  X2)   0,58   0,42   Life  expectancy  at  birth  (  X3)     0,995   0,005   Public  spending  on  education  (  X4)     0,995   0,005   Subsidies  and  other  transfers  of  expense  (X5  )     0,0054   0,946   Terrestrial  protected  areas    (  X6)     0,105   0,895   Unemployment  (  X7)   0,995   0,005   Urban  population  (  X8)                                                           0,496   0,504   Total   4,8604         By  the  final  estimation  of  the  communalities  we  can  see  that  there  are  5  communalities  that  well   explain  the  model    because  higher  than  50%  (these  communalities  refers  to  variables:  X1  ,  X2,    X3,     X4,   X7).   There   are   also   3   communalities   that   don’t   explain   the   model   very   well     (these   communalities  refers  to  variables  X5,    X6,    X8)  .     In  fact  variables  with  high  communality  share  more  in  common  with  the  rest  of  the  variables.   Indeed  specific  variance  for  each  observed  variable  is  that  portion  of  the  variable  that  cannot  be   predicted  from  the  other  variables.   So  we  decided  that  after  ,in  naming  factors,  we  will  not  consider  X5,  X6.  But  given  that  X8  has  a   communality  very  near  to  50%  we  can  consider  this  variable.            
  • 11. 11         Now  we  can  improve  the  interpretation  of  a  the  factors  by  applying  a  rotation  to  the  factor  loading   matrix.      ROTATED  VARIANCE  EXPLAINED  BY  EACH  FACTOR                                            (Total=61.36%)   FACTOR  1   FACTOR  2   FACTOR  3   26.02%   19.9%   15.44%      ROTATED  FACTOR  LOADING  MATRIX    (  varimax)     Factor  1   Factor  2   Factor  3   CO2.emissions  (  X1)                               -­‐0.772   -­‐0.301     Health  expenditure  per  capita  (  X2)   0.645   0.402     Life  expectancy  at  birth  (  X3)     0.890   0.101   -­‐0.439   Public  spending.on  education  (  X4)     0.154   0.984     Subsidies  and  other  transfers  of  expense  (   X5)     0.221       Terrestrial  protected  areas    (  X6)     0.143   0.260   -­‐0.129   Unemployment  (  X7)   -­‐0.260     0.962   Urban  population  (  X8)                                                           -­‐0.343   -­‐0.537   -­‐0.300     SS  loadings       2.082   1.592   1.235   Proportion  Var         0.260     0.199   0.154   Cumulative  Var         0.260   0.459   0.614  
  • 12. 12     It  is  clear  that  with  the  rotation  now  the  variance  explained  by  each  factor  is  well  distributed  and   mostable  factor  3  passes  from  8.9%  to  15.44%.   Furthermore  we  want  to  assign  a  label  to  each  factor  considering  the  more  significant    variables.     In   naming   the   label   of   latent   variables   we   have   considered   more   the   original   variables   with   communality>50%.   First   factor   is   mainly   explained   by   CO2   emissions,     health   expenditure     per   capita,    life  expectancy    at  birth  unemployment.  We  have  not  considered    subsidies  and  other   transfers  of  expense  and  terrestrial  protected  areas    because  they  have  communality<50%.   Second  factor  is  mainly  explained  by    public  spending  on  education  and  urban  population  but  only   the  first  has  a  communality>50%.   The  third  factor  is  explained  by  unemployment.   In  principal  components,  the  first  factor  describes  most  of  variability.   After   choosing   number   of   factors   to   retain,   we   want   to   spread   variability   among   factors   to   improve  the  interpretation.  So  we  consider  “rotated  factors”  that  have  a  better  distinction  in  the   meanings  of  the  factor.           NEW  LATENT  VARIABLES   ORIGINAL  VARIABLES   FACTOR  1     WELFARE  AND  WELL-­‐BEING   CO2.emissions  (  X1)                     Health  expenditure  per  capita  (  X2)   Life  expectancy  at  birth  (  X3)   Subsidies  and  other  transfers  of  expense  (   X5)   FACTOR2   PUBLIC  INTERVENTION  ON   POPULATION   Public  spending  on  education  (  X4)   Terrestrial  protected  areas    (  X6)   Urban  population  (  X8)                                                           FACTOR3   UNEMPLYMENT   Unemployment  (  X7)            
  • 13. 13         CLUSTER  ANALYSIS   Now  we  want  to  analyze  how  we  can  cluster  the  countries  using  the  observations  of  real  variable   in  order  to  get  few  homogenous  groups.   We    compared  two  methods  of  clustering:   1.  hierarchical  method,  using  Euclidean  distance  and  the  ward  method;   2.  hierarchical  method,  using  Euclidean  distance  and  the  complete  linkage  method.     This  is  the  legend  of  countries:   1. Canada   2. France   3. Germany   4. Japan   5. Italy   6. RussianFederation   7. United  States   8. United  Kingdom   9. Brazil   10. China   11. India   12. South  Africa   13. Australia   14. Saudi  Arabia   15. Korea,  Rep.   16. Indonesia   17. Mexico   18. Turkey   19. Spain   20. Netherlands  
  • 14. 14         With R Software we have run an analysis to choose the number of clusters basing on the within sum of squares computation. From this graph we see that we could have four clusters after cluster analysis.    
  • 15. 15     In  this  cluster  analysis  we  have  used  the  ward  method  with  the  Euclidian  distance.  The  ward   method  is  a  non-­‐hierarchical  method  based  on  the  ANOVA  approach.  Where  ANOVA  stands  for   ANalysis  Of  VAriance  table.   The graph suggests us that we can use 3 clusters because we can consider China like an isolated country because has very few in common with other clusters. Cluster 1: Usa, India. (7-11) Cluster 2: Brazil, Mexico, Russia, Japan, Indonesia. (9-17-6-4-16) Cluster 3: Canada, France, Germany, Italy, United Kingdom, South Africa, Australia, Saudi Arabia, South Korea, Turkey, Spain, Netherland.(1-12-20-13-14-19-5-15-8-18-2-3)    
  • 16. 16       These are the means for each variable:     Cluster1   Cluster2   Cluster3   X1=CO2  emissions  (kg  per  2000   US$  of  GDP)   7.584599e-­‐01   1.401193e+00   1.765399e+00   X2=Urban  population   3.652584e+07   1.153590e+08   4.082957e+08     X3=Health  expenditure  per  capita   (current  US$)   3.652584e+07   1.036108e+03   2.639784e+03   X4=Life  expectancy  at  birth,  total   (years)   7.691514e+01   7.343499e+01   7.173244e+01     X5=Unemployment,  total  (%  of   total  labor  force)   7.783333e+00   5.860000e+00   4.833333e+00   X6=Public  spending  on  education,   total  (%  of  GDP)   4.815916e+00   4.147186e+00   3.538987e+00   X7=Subsidies  and  other  transfers   (%  of  expense)   6.459847e+01   6.140823e+01   6.176835e+01   X8=Terrestrial  protected  areas  (%   of  total  land  area)   1.513201e+01   1.538366e+01     1.134538e+01         The  cluster  1  is  that  one  represents  more  variables.  It  is  composed  only  by  Usa  and  India.  This   cluster  seems  to  have    higher  values  in  health  expenditure,  life  expectancy,  unemployment,  public   spending  on  education  and  subsidies.   The  second  cluster  is  that  one  with  more  terrestrial  protected  areas.   Finally  the  third  cluster  has  the  higher  co2  emissions  and  urban  population,  but  we  can  see  also   that  is  the  cluster  formed  by  the  majority  of  elements.    
  • 17. 17         10 1 12 20 13 14 19 5 15 3 2 8 18 9 17 6 4 16 7 11 0e+001e+082e+083e+084e+085e+08 Cluster Dendrogram for Solution HClust.10 Method=average; Distance=euclidian Observation Number in Data Set Dataset Height     This  cluster  analysis  with  average  method  and  Euclidian  distance  give  us  a  result  worse  than  the   previous  analysis.  Now  we  have  10(China)  that  is  an  outlier  and  7  and  11(U.S.  and  India)  that  are   far  different  from  other  two  clusters.      
  • 18. 18             Without  7  9  10  11(U.S.  Brazil,  China,  India),  we  obtain  a  better  cluster  analysis  without  outlier.   Now  we  have  two  clusters,  the  first  composed  by Canada, France, Germany, Italy, United Kingdom, South Africa, Australia, Saudi Arabia, South Korea, Turkey, Spain, Netherland.(1-12-20- 13-14-19-5-15-8-18-2-3). The second is composed by: Mexico, Russia, Japan, Indonesia. (17-6-4- 16) .
  • 19. 19       CONCLUSION     The  initial  aim  of  this  research  was  to  find  a  possible  relationship  between  countries  belonging  to   G20.  After  cluster  and  factor  analysis  we  can  say  that  the  results  obtained  are  quite  interesting   since  the  factor  analysis  suggests  us  3  new  latent  variables  that  summarize  the  original  ones.    We   passed  from  11  original  variables  to  3  variables.   The  factor  analysis  produced  a  quite  satisfactory  result.  We  have  now  three  groups:  “welfare  and   well-­‐being”,  “public  intervention”  and  “unemplyment”.   Also   cluster   analysis   produced   a   satisfactory   result.   We   can   find   some   common   characteristics   among   clusters.   We   can   note   that   cluster 2: Brazil, Mexico, Russia, Japan, Indonesia is characterized by countries with an high population and apart Japan they are all developing countries.   Cluster 3 Canada, France, Germany, Italy, United Kingdom, South Africa, Australia, Saudi Arabia, South Korea, Turkey, Spain, Netherland is the cluster with all the European country that means is the cluster with the higher welfare and equality of people inside clusters. We can also note that there is the highest urban population but also the highest CO2 emissions. It could be more difficult to discuss cluster 1 because is formed by 2 different countries. One the U.S. is characterized by richness and is developed. Indeed India as a majority of poor population and is a developing country. But we can also find some common points that could be public spending on education because both India and U.S. have a good system of education.