SlideShare a Scribd company logo
1 of 4
Download to read offline
Computer Exercise (RStudio)
> data_ksubs<-read.csv("data_ksubs (1).csv")
> View(data_ksubs)
1.
a)
> plot(data_ksubs$inc, data_ksubs$nettfa)
The resulting model suggests heteroskedasticity; one can see, when we plot net financial wealth against
annual family income, the concentration on data points within a specific range of income. As we move
further out, the variation becomes wider and is not constant. One can see points scattered throughout
the graph and because of this, homoscedasticity cannot be assumed.
b)
> ols.fit <- lm(nettfa ~ inc + age + agesq + male + e401k, data = data_ksubs)
> summary(ols.fit)
Call:
lm(formula = nettfa ~ inc + age + agesq + male + e401k, data = data_ksubs)
Residuals:
Min 1Q Median 3Q Max
-508.60 -18.71 -3.38 9.95 1462.18
Coefficients:
Estimate Std. Error t value
(Intercept) 5.22945 10.11065 0.517
inc 0.94712 0.02634 35.954
age -2.38888 0.49058 -4.869
agesq 0.03975 0.00563 7.061
male 4.22406 1.50898 2.799
e401k 6.73346 1.28603 5.236
Pr(>|t|)
(Intercept) 0.60501
inc < 2e-16 ***
age 1.14e-06 ***
agesq 1.78e-12 ***
male 0.00513 **
e401k 1.68e-07 ***
---
Signif. codes:
0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 58.07 on 9269 degrees of freedom
Multiple R-squared: 0.1762, Adjusted R-squared: 0.1757
F-statistic: 396.5 on 5 and 9269 DF, p-value: < 2.2e-16
c)
> N <-dim(data_ksubs)[1]
> part.out.fit <- lm(inc ~ age + agesq + male + e401k, data = data_ksubs)
> std.beta1 <- sqrt((N/(N-5))*sum((residuals(part.out.fit)^2)*(residuals(ols.
fit)^2))/(sum(residuals
+
(part.out.fit)^2))^2)
Heteroskedasticity robust standard error for annual family income; compared with standard error in the
estimation in b)
> std.beta1
[1] 0.07513404
> coef(summary(ols.fit))[2,2]
[1] 0.02634275
> std.beta1/coef(summary(ols.fit))[2,2]
[1] 2.852172
> part.out.fit <- lm(age ~ inc + agesq + male + e401k, data = data_ksubs)
> std.beta2 <- sqrt((N/(N-5))*sum((residuals(part.out.fit)^2)*(residuals(ols.
fit)^2))/(sum(residuals
+
(part.out.fit)^2))^2)
Heteroskedasticity robust standard error for age; compared with standard error in the estimation in b)
> std.beta2
[1] 0.5996238
> coef(summary(ols.fit))[3,2]
[1] 0.4905835
> std.beta2/coef(summary(ols.fit))[3,2]
[1] 1.222267
> part.out.fit <- lm(agesq ~ inc + age + male + e401k, data = data_ksubs)
> std.beta3 <- sqrt((N/(N-5))*sum((residuals(part.out.fit)^2)*(residuals(ols.
fit)^2))/(sum(residuals
+
(part.out.fit)^2))^2)
Heteroskedasticity robust standard error for agesq; compared with standard error in the estimation in b
)
> std.beta3
[1] 0.007472046
> coef(summary(ols.fit))[4,2]
[1] 0.005629612
> std.beta3/coef(summary(ols.fit))[4,2]
[1] 1.327275
> part.out.fit <- lm(male ~ inc + age + agesq + e401k, data = data_ksubs)
> std.beta4 <- sqrt((N/(N-5))*sum((residuals(part.out.fit)^2)*(residuals(ols.
fit)^2))/(sum(residuals
+
(part.out.fit)^2))^2)
Heteroskedasticity robust standard error for male; compared with standard error in the estimation in b)
> std.beta4
[1] 1.447424
> coef(summary(ols.fit))[5,2]
[1] 1.50898
> std.beta4/coef(summary(ols.fit))[5,2]
[1] 0.9592073
> part.out.fit <- lm(e401k ~ inc + age + agesq + male, data = data_ksubs)
> std.beta5 <- sqrt((N/(N-5))*sum((residuals(part.out.fit)^2)*(residuals(ols.
fit)^2))/(sum(residuals
+
(part.out.fit)^2))^2
Heteroskedasticity robust standard error for e401k; compared with standard error in the estimation in b
)
> std.beta5
[1] 1.494642
> coef(summary(ols.fit))[6,2]
[1] 1.286028
> std.beta5/coef(summary(ols.fit))[6,2]
[1] 1.162216
d)
> ci.homo <- ols.fit$coefficients[2] + c(-qnorm(0.975),qnorm(0.975))*coef(su
mmary(ols.fit))[2,2]
> ci.heter <- ols.fit$coefficients[2] + c(-qnorm(0.975),qnorm(0.975))*std.bet
a1
> ci.homo
[1] 0.8954945 0.9987561
> ci.heter
[1] 0.7998653 1.0943853
Interpretation: When we compare the confidence interval for β1 using the homoscedastic standard error
with the confidence interval using the heteroskedastic robust standard error, we see that the c.i. for the
heteroskedastic robust standard error is wider.
Additional Information:
Test for heteroskedasticity (Breush-Pagan)
> test.lin <- lm(I(residuals(ols.fit)^2) ~ inc + age + agesq + male + e401k,d
ata=data_ksubs)
> summary(test.lin)
Call:
lm(formula = I(residuals(ols.fit)^2) ~ inc + age + agesq + male +
e401k, data = data_ksubs)
Residuals:
Min 1Q Median 3Q Max
-43169 -5271 -1235 2159 2121900
Coefficients:
Estimate Std. Error t value
(Intercept) 7251.699 8151.187 0.890
inc 280.903 21.237 13.227
age -895.444 395.507 -2.264
agesq 12.383 4.539 2.728
male 1310.332 1216.537 1.077
e401k -1534.153 1036.793 -1.480
Pr(>|t|)
(Intercept) 0.37368
inc < 2e-16 ***
age 0.02359 *
agesq 0.00638 **
male 0.28146
e401k 0.13898
---
Signif. codes:
0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 46820 on 9269 degrees of freedom
Multiple R-squared: 0.02158, Adjusted R-squared: 0.02105
F-statistic: 40.88 on 5 and 9269 DF, p-value: < 2.2e-16

More Related Content

Similar to hw4analysis

Optimization and Mathematical Programming in R and ROI - R Optimization Infra...
Optimization and Mathematical Programming in R and ROI - R Optimization Infra...Optimization and Mathematical Programming in R and ROI - R Optimization Infra...
Optimization and Mathematical Programming in R and ROI - R Optimization Infra...Dr. Volkan OBAN
 
Regression and Classification with R
Regression and Classification with RRegression and Classification with R
Regression and Classification with RYanchang Zhao
 
Problem set 4 - Statistics and Econometrics - Msc Business Analytics - Imperi...
Problem set 4 - Statistics and Econometrics - Msc Business Analytics - Imperi...Problem set 4 - Statistics and Econometrics - Msc Business Analytics - Imperi...
Problem set 4 - Statistics and Econometrics - Msc Business Analytics - Imperi...Jonathan Zimmermann
 
Boston Predictive Analytics: Linear and Logistic Regression Using R - Interme...
Boston Predictive Analytics: Linear and Logistic Regression Using R - Interme...Boston Predictive Analytics: Linear and Logistic Regression Using R - Interme...
Boston Predictive Analytics: Linear and Logistic Regression Using R - Interme...Enplus Advisors, Inc.
 
R Activity in Biostatistics
R Activity in BiostatisticsR Activity in Biostatistics
R Activity in BiostatisticsLarry Sultiz
 
Dem 7263 fall 2015 Spatial GLM's
Dem 7263 fall 2015 Spatial GLM'sDem 7263 fall 2015 Spatial GLM's
Dem 7263 fall 2015 Spatial GLM'sCorey Sparks
 
Econometric Analysis 8th Edition Greene Solutions Manual
Econometric Analysis 8th Edition Greene Solutions ManualEconometric Analysis 8th Edition Greene Solutions Manual
Econometric Analysis 8th Edition Greene Solutions ManualLewisSimmonss
 
NPTEL QUIZ.docx
NPTEL QUIZ.docxNPTEL QUIZ.docx
NPTEL QUIZ.docxGEETHAR59
 
[1062BPY12001] Data analysis with R / week 2
[1062BPY12001] Data analysis with R / week 2[1062BPY12001] Data analysis with R / week 2
[1062BPY12001] Data analysis with R / week 2Kevin Chun-Hsien Hsu
 
MH prediction modeling and validation in r (2) classification 190709
MH prediction modeling and validation in r (2) classification 190709MH prediction modeling and validation in r (2) classification 190709
MH prediction modeling and validation in r (2) classification 190709Min-hyung Kim
 
Basic R Data Manipulation
Basic R Data ManipulationBasic R Data Manipulation
Basic R Data ManipulationChu An
 
PyCon Siberia 2016. Не доверяйте тестам!
PyCon Siberia 2016. Не доверяйте тестам!PyCon Siberia 2016. Не доверяйте тестам!
PyCon Siberia 2016. Не доверяйте тестам!Ivan Tsyganov
 
Forecasting with Vector Autoregression
Forecasting with Vector AutoregressionForecasting with Vector Autoregression
Forecasting with Vector AutoregressionBryan Butler, MBA, MS
 
Digital electronics k map comparators and their function
Digital electronics k map comparators and their functionDigital electronics k map comparators and their function
Digital electronics k map comparators and their functionkumarankit06875
 
Quantitative Methods for Lawyers - Class #19 - Regression Analysis - Part 2
Quantitative Methods for Lawyers - Class #19 - Regression Analysis - Part 2Quantitative Methods for Lawyers - Class #19 - Regression Analysis - Part 2
Quantitative Methods for Lawyers - Class #19 - Regression Analysis - Part 2Daniel Katz
 
Test s velocity_15_5_4
Test s velocity_15_5_4Test s velocity_15_5_4
Test s velocity_15_5_4Kunihiko Saito
 
K means clustering
K means clusteringK means clustering
K means clusteringAhmedasbasb
 

Similar to hw4analysis (20)

Optimization and Mathematical Programming in R and ROI - R Optimization Infra...
Optimization and Mathematical Programming in R and ROI - R Optimization Infra...Optimization and Mathematical Programming in R and ROI - R Optimization Infra...
Optimization and Mathematical Programming in R and ROI - R Optimization Infra...
 
Regression and Classification with R
Regression and Classification with RRegression and Classification with R
Regression and Classification with R
 
Problem set 4 - Statistics and Econometrics - Msc Business Analytics - Imperi...
Problem set 4 - Statistics and Econometrics - Msc Business Analytics - Imperi...Problem set 4 - Statistics and Econometrics - Msc Business Analytics - Imperi...
Problem set 4 - Statistics and Econometrics - Msc Business Analytics - Imperi...
 
R Programming Homework Help
R Programming Homework HelpR Programming Homework Help
R Programming Homework Help
 
Boston Predictive Analytics: Linear and Logistic Regression Using R - Interme...
Boston Predictive Analytics: Linear and Logistic Regression Using R - Interme...Boston Predictive Analytics: Linear and Logistic Regression Using R - Interme...
Boston Predictive Analytics: Linear and Logistic Regression Using R - Interme...
 
R Activity in Biostatistics
R Activity in BiostatisticsR Activity in Biostatistics
R Activity in Biostatistics
 
Dem 7263 fall 2015 Spatial GLM's
Dem 7263 fall 2015 Spatial GLM'sDem 7263 fall 2015 Spatial GLM's
Dem 7263 fall 2015 Spatial GLM's
 
Econometric Analysis 8th Edition Greene Solutions Manual
Econometric Analysis 8th Edition Greene Solutions ManualEconometric Analysis 8th Edition Greene Solutions Manual
Econometric Analysis 8th Edition Greene Solutions Manual
 
Learning R
Learning RLearning R
Learning R
 
NPTEL QUIZ.docx
NPTEL QUIZ.docxNPTEL QUIZ.docx
NPTEL QUIZ.docx
 
[1062BPY12001] Data analysis with R / week 2
[1062BPY12001] Data analysis with R / week 2[1062BPY12001] Data analysis with R / week 2
[1062BPY12001] Data analysis with R / week 2
 
MH prediction modeling and validation in r (2) classification 190709
MH prediction modeling and validation in r (2) classification 190709MH prediction modeling and validation in r (2) classification 190709
MH prediction modeling and validation in r (2) classification 190709
 
Basic R Data Manipulation
Basic R Data ManipulationBasic R Data Manipulation
Basic R Data Manipulation
 
PyCon Siberia 2016. Не доверяйте тестам!
PyCon Siberia 2016. Не доверяйте тестам!PyCon Siberia 2016. Не доверяйте тестам!
PyCon Siberia 2016. Не доверяйте тестам!
 
Forecasting with Vector Autoregression
Forecasting with Vector AutoregressionForecasting with Vector Autoregression
Forecasting with Vector Autoregression
 
Digital electronics k map comparators and their function
Digital electronics k map comparators and their functionDigital electronics k map comparators and their function
Digital electronics k map comparators and their function
 
Quantitative Methods for Lawyers - Class #19 - Regression Analysis - Part 2
Quantitative Methods for Lawyers - Class #19 - Regression Analysis - Part 2Quantitative Methods for Lawyers - Class #19 - Regression Analysis - Part 2
Quantitative Methods for Lawyers - Class #19 - Regression Analysis - Part 2
 
Test s velocity_15_5_4
Test s velocity_15_5_4Test s velocity_15_5_4
Test s velocity_15_5_4
 
K means clustering
K means clusteringK means clustering
K means clustering
 
CLIM Undergraduate Workshop: (Attachment) Performing Extreme Value Analysis (...
CLIM Undergraduate Workshop: (Attachment) Performing Extreme Value Analysis (...CLIM Undergraduate Workshop: (Attachment) Performing Extreme Value Analysis (...
CLIM Undergraduate Workshop: (Attachment) Performing Extreme Value Analysis (...
 

hw4analysis

  • 1. Computer Exercise (RStudio) > data_ksubs<-read.csv("data_ksubs (1).csv") > View(data_ksubs) 1. a) > plot(data_ksubs$inc, data_ksubs$nettfa) The resulting model suggests heteroskedasticity; one can see, when we plot net financial wealth against annual family income, the concentration on data points within a specific range of income. As we move further out, the variation becomes wider and is not constant. One can see points scattered throughout the graph and because of this, homoscedasticity cannot be assumed. b) > ols.fit <- lm(nettfa ~ inc + age + agesq + male + e401k, data = data_ksubs) > summary(ols.fit) Call: lm(formula = nettfa ~ inc + age + agesq + male + e401k, data = data_ksubs) Residuals: Min 1Q Median 3Q Max -508.60 -18.71 -3.38 9.95 1462.18 Coefficients: Estimate Std. Error t value (Intercept) 5.22945 10.11065 0.517 inc 0.94712 0.02634 35.954 age -2.38888 0.49058 -4.869 agesq 0.03975 0.00563 7.061 male 4.22406 1.50898 2.799 e401k 6.73346 1.28603 5.236
  • 2. Pr(>|t|) (Intercept) 0.60501 inc < 2e-16 *** age 1.14e-06 *** agesq 1.78e-12 *** male 0.00513 ** e401k 1.68e-07 *** --- Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 Residual standard error: 58.07 on 9269 degrees of freedom Multiple R-squared: 0.1762, Adjusted R-squared: 0.1757 F-statistic: 396.5 on 5 and 9269 DF, p-value: < 2.2e-16 c) > N <-dim(data_ksubs)[1] > part.out.fit <- lm(inc ~ age + agesq + male + e401k, data = data_ksubs) > std.beta1 <- sqrt((N/(N-5))*sum((residuals(part.out.fit)^2)*(residuals(ols. fit)^2))/(sum(residuals + (part.out.fit)^2))^2) Heteroskedasticity robust standard error for annual family income; compared with standard error in the estimation in b) > std.beta1 [1] 0.07513404 > coef(summary(ols.fit))[2,2] [1] 0.02634275 > std.beta1/coef(summary(ols.fit))[2,2] [1] 2.852172 > part.out.fit <- lm(age ~ inc + agesq + male + e401k, data = data_ksubs) > std.beta2 <- sqrt((N/(N-5))*sum((residuals(part.out.fit)^2)*(residuals(ols. fit)^2))/(sum(residuals + (part.out.fit)^2))^2) Heteroskedasticity robust standard error for age; compared with standard error in the estimation in b) > std.beta2 [1] 0.5996238 > coef(summary(ols.fit))[3,2] [1] 0.4905835 > std.beta2/coef(summary(ols.fit))[3,2] [1] 1.222267 > part.out.fit <- lm(agesq ~ inc + age + male + e401k, data = data_ksubs) > std.beta3 <- sqrt((N/(N-5))*sum((residuals(part.out.fit)^2)*(residuals(ols. fit)^2))/(sum(residuals + (part.out.fit)^2))^2) Heteroskedasticity robust standard error for agesq; compared with standard error in the estimation in b )
  • 3. > std.beta3 [1] 0.007472046 > coef(summary(ols.fit))[4,2] [1] 0.005629612 > std.beta3/coef(summary(ols.fit))[4,2] [1] 1.327275 > part.out.fit <- lm(male ~ inc + age + agesq + e401k, data = data_ksubs) > std.beta4 <- sqrt((N/(N-5))*sum((residuals(part.out.fit)^2)*(residuals(ols. fit)^2))/(sum(residuals + (part.out.fit)^2))^2) Heteroskedasticity robust standard error for male; compared with standard error in the estimation in b) > std.beta4 [1] 1.447424 > coef(summary(ols.fit))[5,2] [1] 1.50898 > std.beta4/coef(summary(ols.fit))[5,2] [1] 0.9592073 > part.out.fit <- lm(e401k ~ inc + age + agesq + male, data = data_ksubs) > std.beta5 <- sqrt((N/(N-5))*sum((residuals(part.out.fit)^2)*(residuals(ols. fit)^2))/(sum(residuals + (part.out.fit)^2))^2 Heteroskedasticity robust standard error for e401k; compared with standard error in the estimation in b ) > std.beta5 [1] 1.494642 > coef(summary(ols.fit))[6,2] [1] 1.286028 > std.beta5/coef(summary(ols.fit))[6,2] [1] 1.162216 d) > ci.homo <- ols.fit$coefficients[2] + c(-qnorm(0.975),qnorm(0.975))*coef(su mmary(ols.fit))[2,2] > ci.heter <- ols.fit$coefficients[2] + c(-qnorm(0.975),qnorm(0.975))*std.bet a1 > ci.homo [1] 0.8954945 0.9987561 > ci.heter [1] 0.7998653 1.0943853 Interpretation: When we compare the confidence interval for β1 using the homoscedastic standard error with the confidence interval using the heteroskedastic robust standard error, we see that the c.i. for the heteroskedastic robust standard error is wider. Additional Information: Test for heteroskedasticity (Breush-Pagan)
  • 4. > test.lin <- lm(I(residuals(ols.fit)^2) ~ inc + age + agesq + male + e401k,d ata=data_ksubs) > summary(test.lin) Call: lm(formula = I(residuals(ols.fit)^2) ~ inc + age + agesq + male + e401k, data = data_ksubs) Residuals: Min 1Q Median 3Q Max -43169 -5271 -1235 2159 2121900 Coefficients: Estimate Std. Error t value (Intercept) 7251.699 8151.187 0.890 inc 280.903 21.237 13.227 age -895.444 395.507 -2.264 agesq 12.383 4.539 2.728 male 1310.332 1216.537 1.077 e401k -1534.153 1036.793 -1.480 Pr(>|t|) (Intercept) 0.37368 inc < 2e-16 *** age 0.02359 * agesq 0.00638 ** male 0.28146 e401k 0.13898 --- Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 Residual standard error: 46820 on 9269 degrees of freedom Multiple R-squared: 0.02158, Adjusted R-squared: 0.02105 F-statistic: 40.88 on 5 and 9269 DF, p-value: < 2.2e-16