Sheet1stateviolentmurdermetrowhitehsgradpovertysnglparviofitvioresAK761941.875.286.69.114.3715.1390.2929492AL78011.667.473.566.917.411.5691.56320.4937207AR59310.244.782.966.32010.7453.88850.7989318AZ7158.684.788.678.715.412.1871.0881-0.872175CA107813.196.779.376.218.212.51067.5030.0599926CO5675.881.892.584.49.912.1751.1429-1.043932CT4566.395.78979.28.510.1570.3966-0.6560233DE686582.779.477.510.211.4670.80730.0854869FL12068.99383.574.417.810.6779.88882.470016GA72311.467.770.870.913.513823.5709-0.5653531HI2613.874.740.980.189.1264.7408-0.0212222IA3262.343.896.680.110.3950.250431.564434ID2822.93096.779.713.19.557.919851.284314IL96011.4848176.213.611.5754.33861.147723IN4897.571.690.675.612.210.8539.8218-0.2825857KS4966.454.690.981.313.19.9303.47481.074898KY4636.648.591.864.620.410.6477.4698-0.0833644LA106220.37566.768.326.414.91360.373-1.793716MA8053.996.291.18010.710.9719.1340.4875998MD99812.792.868.978.49.712820.48431.012708ME1261.635.798.578.810.710.6205.7611-0.4580154MI7929.882.783.176.815.413974.5974-1.022228MN3273.469.39482.411.69.9392.0398-0.3628658MO74411.368.387.673.916.110.9596.18010.8240324MS43413.530.763.364.324.714.7957.0128-3.193796MT17832492.68114.910.8214.9012-0.2136206NC67911.366.375.27014.411.1576.94740.5662267ND821.741.694.276.711.28.4-30.50590.6415154NE3393.950.694.381.810.39.4156.45031.028228NH138259.49882.29.99.2191.7913-0.3023897NJ6275.310080.876.710.99.6580.28960.2696506NM93085687.175.117.413.8906.85190.131264NV87510.484.886.778.89.812.4812.5840.3557735NY107413.391.777.274.816.412.71023.0160.2872973OH504681.387.575.71311.4709.3514-1.144457OK6358.460.182.574.619.911.1625.64940.0531835OR5034.67093.281.511.811.3586.4274-0.4646936PA4186.884.888.774.713.29.6501.9544-0.4756588RI4023.993.692.67211.210.8694.3781-1.653521SC102310.369.868.668.318.712.3839.26341.029668SD2083.432.690.277.114.29.484.482480.7058436TN76610.267.782.867.119.611.2693.08590.4139353TX76211.983.985.172.117.411.8860.463-0.5538287UT3013.177.594.885.110.710453.5657-0.8545464VA3728.377.577.175.29.710.3475.6079-0.5816147VT1143.62798.480.81011178.2364-0.379625WA5155.28389.483.812.111.7746.4708-1.294313WI2644.468.192.178.612.610.4466.5293-1.126045WV2086.941.896.36622.29.4297.9507-0.5456529WY2863.429.795.98313.310.8231.23770.3144581
Sheet2
Sheet3
For question 6 you should use the full dataset with all of the observations. To compare the observed and fitted values for those 3 observations you can use the 'list state metro.... if abs(viores)>2' command on the second page. And then to see if those observations have unusual explanatory or outcome values you can use the 'summ' command also on the second page.
For question 7, you first run the 'regr violent metro poverty snglpar' regression model on the full dataset and extract the information the question asks for from the output (R^2, root MSE, coefficient est, se). Then you drop the DC observation using the command on the 2nd page and rerun the regression model and extract the needed information from the output. You.
19. metro 7.608808 1.295273 5.87 0.000 4.999995
10.21762
violent Coef. Std. Err. t P>|t| [95% Conf.
Interval]
Total 9728474.75 50 194569.495 Root MSE =
180.18
Adj R-squared = 0.8332
Residual 1460856.54 45 32463.4787 R-squared
= 0.8498
Model 8267618.21 5 1653523.64 Prob > F =
0.0000
F(5, 45) = 50.93
Source SS df MS Number of obs =
51
. regr violent metro white hsgrad poverty snglpar
_cons -1666.436 147.852 -11.27 0.000 -1963.876
-1368.996
snglpar 132.4081 15.50322 8.54 0.000 101.2196
163.5965
poverty 17.68024 6.94093 2.55 0.014 3.716893
31.6436
metro 7.828935 1.254699 6.24 0.000 5.304806
10.35306
violent Coef. Std. Err. t P>|t| [95% Conf.
Interval]
Total 9728474.75 50 194569.495 Root MSE =
182.07
Adj R-squared = 0.8296
Residual 1557994.53 47 33148.8199 R-squared
= 0.8399
Model 8170480.21 3 2723493.4 Prob > F =
20. 0.0000
F(3, 47) = 82.16
Source SS df MS Number of obs =
51
. regr violent metro poverty snglpar
Prob > F = 0.2349
F( 2, 45) = 1.50
( 2) hsgrad = 0
( 1) white = 0
. test white hsgrad
_cons -1795.904 668.7885 -2.69 0.010 -3142.914
-448.8953
snglpar 109.4666 20.35989 5.38 0.000 68.45967
150.4735
poverty 26.24416 11.08327 2.37 0.022 3.921304
48.56702
hsgrad 8.646443 7.826016 1.10 0.275 -7.115962
24.40885
white -4.482907 2.779073 -1.61 0.114 -10.08025
1.114434
metro 7.608808 1.295273 5.87 0.000 4.999995
10.21762
violent Coef. Std. Err. t P>|t| [95% Conf.
Interval]
Total 9728474.75 50 194569.495 Root MSE =
180.18
Adj R-squared = 0.8332
Residual 1460856.54 45 32463.4787 R-squared
= 0.8498
Model 8267618.21 5 1653523.64 Prob > F =
0.0000
F(5, 45) = 50.93
Source SS df MS Number of obs =
21. 51
. regr violent metro white hsgrad poverty snglpar
_cons 2152.347 832.4773 2.59 0.013 479.4211
3825.273
hsgrad -20.19723 10.89283 -1.85 0.070 -42.08718
1.692727
violent Coef. Std. Err. t P>|t| [95% Conf.
Interval]
Total 9728474.75 50 194569.495 Root MSE =
430.72
Adj R-squared = 0.0465
Residual 9090650.24 49 185523.474 R-squared
= 0.0656
Model 637824.5 1 637824.5 Prob > F =
0.0697
F(1, 49) = 3.44
Source SS df MS Number of obs =
51
. regr violent hsgrad
Public Health 141
regression case study 2
This exercise uses the crime data from Agresti and Finlay, from
the Statistical Abstract of the US for a recent year. There are
51 observations, one for each state and the District of
Columbia.
The dataset is crime.dta is in bcourses.
Here is a brief description of the variables:
22. . desc
Contains data from C:PH142BCRIME.DTA
obs: 51 Agresti and Finlay crime
data
vars: 8 14 Sep 1997 20:55
size: 1,785 (86.0% of memory free)
1. state str3 %9s
2. violent float %9.0g violent crime per
100,000
3. murder float %9.0g murders per 100,000
4. metro float %9.0g percent of pop living in
metro
5. white float %9.0g percent white
6. hsgrad float %9.0g percent high school grad
or mor
7. poverty float %9.0g percent of families in
poverty
8. snglpar float %9.0g percent of singleparent
famili
In class, we are using the poverty rate as an outcome variable;
for this lab, use the violent crime rate as the outcome.
Use the examples in the reader as models for the commands.
Be sure to read all the questions, as there are some stata
commands you need to plan on your own.
Fit the following regression models:
regr violent metro white hsgrad poverty snglpar
23. regr violent metro poverty white snglpar
regr violent metro poverty snglpar
regr violent poverty
regr violent white
regr violent hsgrad
regr violent metro
regr metro poverty
regr metro snglpar
Continue to explore the association between some of the X
variables:
regr poverty hsgrad
regr poverty white
regr white metro poverty snglpar hsgrad
regr hsgrad metro povery snglpar white
Refit the model
regr violent metro poverty snglpar
and use Stata's predict command to calculate the fitted values
(call them viofit) and the standardized residuals (call them
viores) so that you can check the model assumptions.
24. Do the assumption checking for question 1 at this point; before
you drop any observations!
See question 1 to help plan your commands now!
List the observations with large standardized residuals:
list state metro poverty snglpar violent viofit viores if
abs(viores) > 2
And get a summary of the variables in this model to help
explore the outliers
summ violent metro poverty snglpar, detail
Just to see what the influence of these 3 observations are on the
conclusions:
drop if state=="DC"
regr violent metro poverty snglpar
drop if abs(viores) > 2
regr violent metro poverty snglpar
Note: Once you have dropped an observation, it’s gone.
You may need to reopen the dataset to do the assumption
checking for the model with all 51 observations.
1. Using the results for all the states and DC, discuss the
25. assumptions for the model:
regr violent metro poverty snglpar
Use the residual vs. fitted plot to discuss the functional form
and the assumption of constant variance.
Use the box plot to look for outliers and to assess symmetry,
and the qnorm plot and the Shapiro-Wilk test to discuss the
normality assumption.
No matter what you conclude here, interpret the tests with
caution in the following questions.
Questions 2, 3, 4, and 5 all use the models with DC included.
2. In the full model regr violent metro white hsgrad poverty
snglpar
Interpret the t test for the variable white.
Interpret the t test for the variable poverty.
3. Set up and carry out the restricted vs. full F test to compare
these two models
regr violent metro white hsgrad poverty snglpar
regr violent metro poverty snglpar
Be sure to state the hypotheses, show the calculation of the
F statistic from the SS residuals,
give the numerator and denominator degrees of freedom,
26. use Stata's Ftail function to find the P value, and state your
conclusion in words.
4. Compare your conclusions about the association between
percent high school graduates and violent crime from the
models
regr violent hsgrad
regr violent metro white hsgrad poverty snglpar
(Note: this question is not asking for a test to compare the 2
models!)
Use the regression of hsgrad on the other predictor variables
metro white poverty snglpar to explain why these models lead to
different conclusions about the association between pecent
hsgrad and violent crime. (This is an example of collinearity.)
5. Take a look at the models
regr violent metro
regr violent metro poverty snglpar
Metro is significant in both models
Verify that the differences in the point estimate, standard error,
and confidence interval
for metro are relatively large. This is an example of
confounding.
For this to happen, snglpar and/or poverty
must be associated both with metro and with violent.
27. Check that this is the case.
6. Compare the fitted and observed values for the District of
Columbia (DC), Mississippi (MS) and Florida (FL) for the
model using all the observations. Do these states have unusual
values on the X variables? on the outcome variable?
7. Make a table of the estimated coefficients and standard errors
for the model
regr violent metro poverty snglpar
for all 51 observations, for the 50 states with DC dropped,
for the 48 states with the 2 outliers and DC dropped.
Also make a table of the R2 values and the root MSEs for the
3 models and compare them.
Which coefficients are sensitive to the points that did not fit
well, and which are not?
(That is, which variables have coefficient estimates that are
similar for all 3 sets of states,
and which variables have coefficient estimates that are
different?)
What changes do you see in the standard errors?
(Notice that with DC omitted, the SS total is much smaller,
which is why the R2 value is actually smaller for the model
with DC dropped.)
28. Sheet 1: Find the following regression models:
desc
regr violent metro white hsgrad poverty snglpar
regr violent metro poverty white snglpar
regr violent metro poverty snglpar
regr violent poverty
regr violent white
regr violent hsgrad
regr violent metro
regr violent snglpar
regr metro poverty
regr metro snglpar
regr poverty hsgrad
regr poverty white
regr white metro poverty snglpar hsgrad
regr hsgrad metro poverty snglpar white
29. Sheet 2: Refit the model
regr violent metro poverty snglpar
predict viofit
predict viores, rstandard
scatter viores viofit, title(viores and viofit) yline(0)
swilk viores
list state metro poverty snglpar violent viofit viores if
abs(viores) > 2
summ violent metro poverty snglpar, detail
drop if state=="DC"
(1 observation deleted)
regr violent metro poverty snglpar
. drop if abs(viores) > 2
(2 observations deleted)
regr violent metro poverty snglpar
48. poverty
hsgrad float %9.0g percent high school grad
or mor
white float %9.0g percent white
metro float %9.0g percent of pop living in
metro
murder float %9.0g murders per 100,000
violent float %9.0g violent crime per 100,000
state str3 %9s
variable name type format label variable label
storage display value
size: 2,397
vars: 12 6 Aug 2015 11:03
obs: 51 Agresti and Finlay crime data
Contains data from E:Regresstion Case Study 2STATA.dta
. desc
_cons -1795.904 668.7885 -2.69 0.010 -3142.914
-448.8953
snglpar 109.4666 20.35989 5.38 0.000 68.45967
150.4735
poverty 26.24416 11.08327 2.37 0.022 3.921304
48.56702
hsgrad 8.646443 7.826016 1.10 0.275 -7.115962
24.40885
white -4.482907 2.779073 -1.61 0.114 -10.08025
1.114434
metro 7.608808 1.295273 5.87 0.000 4.999995
10.21762
violent Coef. Std. Err. t P>|t| [95% Conf.
Interval]
Total 9728474.75 50 194569.495 Root MSE =
180.18
49. Adj R-squared = 0.8332
Residual 1460856.54 45 32463.4787 R-squared
= 0.8498
Model 8267618.21 5 1653523.64 Prob > F =
0.0000
F(5, 45) = 50.93
Source SS df MS Number of obs =
51
. regr violent metro white hsgrad poverty snglpar
_cons -1191.974 386.2523 -3.09 0.003 -1969.46
-414.4888
snglpar 120.3584 17.85667 6.74 0.000 84.41482
156.302
white -3.507233 2.641344 -1.33 0.191 -8.823982
1.809516
poverty 16.67072 6.927109 2.41 0.020 2.727174
30.61427
metro 7.404345 1.285055 5.76 0.000 4.817663
9.991027
violent Coef. Std. Err. t P>|t| [95% Conf.
Interval]
Total 9728474.75 50 194569.495 Root MSE =
180.61
Adj R-squared = 0.8324
Residual 1500483.3 46 32619.2022 R-squared =
0.8458
Model 8227991.45 4 2056997.86 Prob > F =
0.0000
F(4, 46) = 63.06
Source SS df MS Number of obs =
51
. regr violent metro poverty white snglpar
_cons -1666.436 147.852 -11.27 0.000 -1963.876
51. Model 2525496.21 1 2525496.21 Prob > F =
0.0001
F(1, 49) = 17.18
Source SS df MS Number of obs =
51
. regr violent poverty
_cons 2508.917 297.7758 8.43 0.000 1910.514
3107.32
white -22.54337 3.498087 -6.44 0.000 -29.57304
-15.5137
violent Coef. Std. Err. t P>|t| [95% Conf.
Interval]
Total 9728474.75 50 194569.495 Root MSE =
327.81
Adj R-squared = 0.4477
Residual 5265524.68 49 107459.687 R-squared
= 0.4588
Model 4462950.06 1 4462950.06 Prob > F =
0.0000
F(1, 49) = 41.53
Source SS df MS Number of obs =
51
. regr violent white
ph 141
regression case study 2
This exercise uses the crime data from Agresti and Finlay, from
the Statistical Abstract of the US for a recent year. There are
51 observations, one for each state and the District of
Columbia.
52. The dataset is crime.dta is in bcourses.
Here is a brief description of the variables:
. desc
Contains data from C:PH142BCRIME.DTA
obs: 51 Agresti and Finlay crime
data
vars: 8 14 Sep 1997 20:55
size: 1,785 (86.0% of memory free)
1. state str3 %9s
2. violent float %9.0g violent crime per
100,000
3. murder float %9.0g murders per 100,000
4. metro float %9.0g percent of pop living in
metro
5. white float %9.0g percent white
6. hsgrad float %9.0g percent high school grad
or mor
7. poverty float %9.0g percent of families in
poverty
8. snglpar float %9.0g percent of singleparent
famili
In class, we are using the poverty rate as an outcome variable;
for this lab, use the violent crime rate as the outcome.
Use the examples in the reader as models for the commands.
Be sure to read all the questions, as there are some stata
commands you need to plan on your own.
53. Fit the following regression models:
regr violent metro white hsgrad poverty snglpar
regr violent metro poverty white snglpar
regr violent metro poverty snglpar
regr violent poverty
regr violent white
regr violent hsgrad
regr violent metro
regr metro poverty
regr metro snglpar
Continue to explore the association between some of the X
variables:
regr poverty hsgrad
regr poverty white
regr white metro poverty snglpar hsgrad
regr hsgrad metro povery snglpar white
Refit the model
regr violent metro poverty snglpar
54. and use Stata's predict command to calculate the fitted values
(call them viofit) and the standardized residuals (call them
viores) so that you can check the model assumptions.
Do the assumption checking for question 1 at this point; before
you drop any observations!
See question 1 to help plan your commands now!
List the observations with large standardized residuals:
list state metro poverty snglpar violent viofit viores if
abs(viores) > 2
And get a summary of the variables in this model to help
explore the outliers
summ violent metro poverty snglpar, detail
Just to see what the influence of these 3 observations are on the
conclusions:
drop if state=="DC"
regr violent metro poverty snglpar
drop if abs(viores) > 2
regr violent metro poverty snglpar
Note: Once you have dropped an observation, it’s gone.
You may need to reopen the dataset to do the assumption
55. checking for the model with all 51 observations.
Question 1. Using the results for all the states and DC, discuss
the assumptions for the model:
regr violent metro poverty snglpar
Use the residual vs. fitted plot to discuss the functional form
and the assumption of constant variance.
Use the box plot to look for outliers and to assess symmetry,
and the qnorm plot and the Shapiro-Wilk test to discuss the
normality assumption.
No matter what you conclude here, interpret the tests with
caution in the following questions.
Questions 2, 3, 4, and 5 all use the models with DC included.
Question 2. In the full model regr violent metro white hsgrad
poverty snglpar
Interpret the t test for the variable white.
Interpret the t test for the variable poverty.
Question 3. Set up and carry out the restricted vs. full F test to
compare these two models
regr violent metro white hsgrad poverty snglpar
regr violent metro poverty snglpar
Be sure to state the hypotheses, show the calculation of the
F statistic from the SS residuals,
56. give the numerator and denominator degrees of freedom,
use Stata's Ftail function to find the P value, and state your
conclusion in words.
Question 4. Compare your conclusions about the association
between percent high school graduates and violent crime from
the models
regr violent hsgrad
regr violent metro white hsgrad poverty snglpar
(Note: this question is not asking for a test to compare the 2
models!)
Use the regression of hsgrad on the other predictor variables
metro white poverty snglpar to explain why these models lead to
different conclusions about the association between pecent
hsgrad and violent crime. (This is an example of collinearity.)
Question 5. Take a look at the models
regr violent metro
regr violent metro poverty snglpar
Metro is significant in both models
Verify that the differences in the point estimate, standard error,
and confidence interval
for metro are relatively large. This is an example of
confounding.
For this to happen, snglpar and/or poverty
must be associated both with metro and with violent.
57. Check that this is the case.
Question 6. Compare the fitted and observed values for the
District of Columbia (DC), Mississippi (MS) and Florida (FL)
for the model using all the observations. Do these states have
unusual values on the X variables? on the outcome variable?
Question 7. Make a table of the estimated coefficients and
standard errors for the model
regr violent metro poverty snglpar
for all 51 observations, for the 50 states with DC dropped,
for the 48 states with the 2 outliers and DC dropped.
Also make a table of the R2 values and the root MSEs for the
3 models and compare them.
Which coefficients are sensitive to the points that did not fit
well, and which are not?
(That is, which variables have coefficient estimates that are
similar for all 3 sets of states,
and which variables have coefficient estimates that are
different?)
What changes do you see in the standard errors?
(Notice that with DC omitted, the SS total is much smaller,
which is why the R2 value is actually smaller for the model
with DC dropped.)