1. Cost of Living
The Mercer Human Resource Consulting website (www.mercerhr.com) lists prices of
certain items in selected cities around the world. They also report an overall cost-of living index
for each city compared to the costs of hundreds of items in New York City.
For example, London at 110.6 is 10.6% more expensive than New York. Youโll find the
2006 data for 16 cities in the data set Cost_of_living_vs_cost_of_items. Included are
the 2006 cost of living index, cost of a luxury apartment (per month), price of a bus or
subway ride, price of a compact disc, price of an international newspaper, price of a cup of coffee
(including service), and price of a fast-food hamburger meal. All prices are in U.S. dollars.
Examine the relationship between the overall cost of living and the cost of each of
these individual items. Verify the necessary conditions and describe the relationship in
as much detail as possible. (Remember to look at direction, form, and strength.)
Identify any unusual observations.
Based on the correlations and linear regressions, which item would be the best predictor of overall
cost in these cities? Which would be the worst? Are there any surprising relationships? Write a
short report detailing your conclusions.
2. Dataset:
City Cost of Living Rent Public Trans CD News Coffee Fast Food
London 110.6 1700 2 11.99 1.1 1.9 4.5
Dublin 91.8 824 1.03 14.06 1.37 2.06 4.05
Paris 93.1 1303 0.96 11.65 1.37 1.51 4.12
Rome 89.8 926 0.69 14.58 1.37 1.51 3.91
Amsterdam 83.4 926 1.1 15.08 1.78 1.71 4.46
Berlin 79.2 720 1.44 12.34 1.44 1.71 3.26
Athens 81.1 721 0.55 13.03 1.23 2.88 4.97
Brussels 79.5 652 1.03 13.7 1.37 1.51 3.77
Madrid 81.6 892 0.75 13.72 1.71 1.58 4.18
Prague 82.1 754 0.41 14.44 1.2 2.17 2.89
Warsaw 80.4 754 0.43 13.52 1.8 1.98 2.79
Tokyo 119.1 2352 1.32 12.25 0.74 1.47 2.99
Sydney 91.3 1104 1.06 11.03 1.63 1.49 2.74
New York 100 1998 1.14 10.77 0.93 2.26 3.43
Buenos Aires 54.8 571 0.15 6.88 2.6 0.84 1.58
Vancouver 81.2 804 1.13 10.61 1.88 1.63 2.79
Finding Standard Deviation & Mean
SD 14.54 518.22 0.45 2.05 0.44 0.45 0.87
Mean 87.44 1062.56 0.95 12.48 1.47 1.76 3.53
3. City Cost of Living (y) Rent(x) xy ๐ฅ2
๐ฆ2
London 110.6 1700 188020 12232.36 2890000
Dublin 91.8 824 75643.2 8427.24 678976
Paris 93.1 1303 121309.3 8667.61 1697809
Rome 89.8 926 83154.8 8064.04 857476
Amsterdam 83.4 926 77228.4 6955.56 857476
Berlin 79.2 720 57024 6272.64 518400
Athens 81.1 721 58473.1 6577.21 519841
Brussels 79.5 652 51834 6320.25 425104
Madrid 81.6 892 72787.2 6658.56 795664
Prague 82.1 754 61903.4 6740.41 568516
Warsaw 80.4 754 60621.6 6464.16 568516
Tokyo 119.1 2352 280123.2 14184.81 5531904
Sydney 91.3 1104 100795.2 8335.69 1218816
New York 100 1998 199800 10000 3992004
Buenos Aires 54.8 571 31290.8 3003.04 326041
Vancouver 81.2 804 65284.8 6593.44 646416
๐ = 16 ๐ฆ = 1399 ๐ฅ = 17001 ๐ฅ๐ฆ = 1585293 ๐ฅ2
= 125497.02 ๐ฆ2
= 22092959
Finding Correlation Coefficient:
4. 0
20
40
60
80
100
120
140
0 500 1000 1500 2000 2500
COSTOFLIVING
RENT
Cost of Living Vs Rent
Correlation Coefficient, r= 0.874
Direction= Positive
Form= Fairly Linear
Strength= Very Strong +ve Correlation
Sample Correlation Coefficient:
๐ =
(๐ฅโ๐ฅ)(๐ฆโ๐ฆ)
(๐ฅโ๐ฅ)2 (๐ฆโ๐ฆ)2
or the algebraic Equivalent:
r=
๐ ๐ฅ๐ฆโ ๐ฅ ๐ฆ
๐( ๐ฅ2)โ( ๐ฅ)
2
๐( ๐ฆ2)โ( ๐ฆ)
2
Where,
r= Sample Correlation Coefficient
n= Sample Size
x= Value of the independent Variable
y= Value of the dependent variable
Slope ๐1= ๐
๐ ๐ฆ
๐ ๐ฅ
= 0.874
14.54
518.22
= 0.0245
Intercept ๐0 = ๐ฆ โ ๐1 ๐ฅ1
= 87.44 โ 0.0245 ร 1062.56 = 61.41
๐ฆ = ๐0 + ๐1 ๐ฅ1
๐ฆ = 61.41 + 0.0245(๐ ๐๐๐ก)
6. 0
20
40
60
80
100
120
140
0 5 10 15 20
COSTOFLIVING
COMPACT DISC
Cost of living vs Compact Disc
0
20
40
60
80
100
120
140
0 0.5 1 1.5 2 2.5 3
COSTOFLIVING
NEWSPAPER
Cost of living vs Newspaper
Correlation Coefficient, r= 0.243
Direction= Positive
Form= Linear
Strength= Weak +ve Correlation
Correlation Coefficient, r= -0.834
Direction= Negative
Form= Linear
Strength= Very Strong โve Correlation
๐ฆ = 65.91 + 1.72(๐ถ๐๐๐๐๐๐ก ๐ท๐๐ ๐) ๐ฆ = 128.21 โ 27.74(๐๐๐ค๐ )
7. Correlation Coefficient, r= 0.225
Direction= Positive
Form= Linear
Strength= Weak +ve Correlation
Correlation Coefficient, r= 0.358
Direction= Positive
Form= Linear
Strength= Weak +ve Correlation
0
20
40
60
80
100
120
140
0 0.5 1 1.5 2 2.5 3 3.5
COSTOFLIVING
COFFEE
Cost of Living vs Coffee
0
20
40
60
80
100
120
140
0 1 2 3 4 5 6
COSTOFLIVING
FAST FOOD
Cost of Living vs Fast Food
๐ฆ = 74.68 + 7.24(๐๐๐๐๐๐) ๐ฆ = 66.42 + 5.96(๐น๐๐ ๐ก ๐๐๐๐)
8. Correlation Analysis:
Cost of Living Rent Public Trans CD News Coffee Fast Food
Cost of Living 1.000
Rent +0.874 1.000
Public Trans +0.696 0.561 1.000
Compact Disc +0.243 -0.128 0.071 1.000
News -0.834 -0.675 -0.510 -0.423 1.000
Coffee +0.225 0.040 0.034 0.438 -0.527 1.000
Fast Food +0.358 0.089 0.361 0.624 -0.469 0.546 1.000
โข Correlation Coefficients range from -1 to +1. +1 means a perfect positive relationship. 0 means no relationship.
-1 means a perfect negative relationship.
โข Correlation measure the direction, and strength of a linear relationship among variables.
โข Negative or positive sign before a number in correlation does not indicate that the relationship is stronger or weaker.
Negative or positive sign only indicate the direction of the relationship.
9. Regression Statistics
Multiple R 0.9671
R Square 0.9352
Adjusted R Square 0.8921
Standard Error 4.7773
Observations(n) 16
ANOVA
df SS MS F Significance F
Regression (k) 6 2966.5576 494.4263 21.6643 7.1736E-05
Residual (n-k-1) 9 205.3999 22.8222
Total 15 3171.9575
CONFIDENCE INTERVAL
Coefficients Standard Error t Stat P-value Lower 95% Upper 95%
Intercept 48.6419 26.0346 1.8684 0.0945 -10.2525 107.5364
Rent 0.0186 0.0047 3.9958 0.0031 0.0081 0.0291
Public Transport 7.1880 3.7968 1.8932 0.0909 -1.4010 15.7771
Compact Disc 1.7712 0.9707 1.8247 0.1013 -0.4246 3.9670
News -5.8995 6.6830 -0.8828 0.4003 -21.0174 9.2184
Coffee -0.0062 4.1614 -0.0015 0.9988 -9.4200 9.4075
Fast Food -0.3394 2.2114 -0.1535 0.8814 -5.3421 4.6632
Multiple Regression Analysis:
๐ ๐๐๐๐๐ ๐ ๐๐๐ ๐ธ๐๐ข๐๐ก๐๐๐:
๐ = ๐ ๐ + ๐ ๐ ๐ ๐ + ๐ ๐ ๐ ๐ + ๐ ๐ ๐ ๐ + ๐ ๐ ๐ ๐ + ๐ ๐ ๐ ๐ + ๐ ๐ ๐ ๐
๐ถ๐๐ ๐ก๐๐๐๐๐ฃ๐๐๐ = 48.64 + 0.0186(๐ ๐๐๐ก) +7.18(Transport)+1.77(CD)-5.89(News)-0.006(coffee)-
0.339(Food)
From R square Value we can conclude that 0.892 or 89.2% of ๐๐ ๐๐ข๐ ๐๐ ๐ก๐๐๐๐ก๐๐
prediction is correct. The remainder is error.
0 = No Relationship
Zero does not appear
In CI conclude
x & y linear relationship
Low p-value(<0.05)
indicate that a predictor
(independent variables) is
significant in regression
analysis.
10. y = 0.0245x + 61.385
0
20
40
60
80
100
120
140
0 500 1000 1500 2000 2500
COSTOFLIVING
RENT
Regression Analysis:
Regression Statistics
Multiple R 0.8738
R Square 0.7634
Adjusted R Square 0.7466
Standard Error 7.3209
Observations(n) 16
ANOVA
df SS MS F
Significance
F
Regression
(k)
1 2421.6290 2421.6290 45.1839
9.75269E-
06
Residual 14 750.3285 53.5949
Total 15 3171.9575
Coefficients Standard Error t Stat P-value Lower 95% Upper 95%
Intercept 61.3852 4.2861 14.3218 9.37E-10 52.1924 70.5781
RENT 0.0245 0.0036 6.7219 9.75E-06 0.0167 0.0323
Centroid
SSR
SSE
SST
๐ ๐
๐ โ
๐บ๐บ๐น
๐บ๐บ๐ป
=
๐บ๐บ๐ฌ
๐บ๐บ๐ป
SST= SSE + SSR
DF= n- k- 1= 16- 1- 1= 14
๐ฆ = ๐0 + ๐1 ๐ฅ1
๐ = ๐๐. ๐๐ + ๐. ๐๐๐(๐น๐ฌ๐ต๐ป)
= T. INV. 2T 5%, 14
= 2.14
(74.66 % appropriate)
11. Conclusions
COST OF LIVING
Direction Strength
RENT r=+0.874 BEST PREDICTOR
NEWS r=-0.834
PUBLIC TRANSPORT r=+0.696
FAST FOOD
r=+0.358
COMPACT DISC r=+0.243
COFFEE r=+0.225 WORST PREDICTOR