Sheet1stateviolentmurdermetrowhitehsgradpovertysnglparviofitviores.docx

Sheet1stateviolentmurdermetrowhitehsgradpovertysnglparviofit
vioresAK761941.875.286.69.114.3715.1390.2929492AL78011.6
67.473.566.917.411.5691.56320.4937207AR59310.244.782.966.
32010.7453.88850.7989318AZ7158.684.788.678.715.412.1871.
0881-
0.872175CA107813.196.779.376.218.212.51067.5030.0599926C
O5675.881.892.584.49.912.1751.1429-
1.043932CT4566.395.78979.28.510.1570.3966-
0.6560233DE686582.779.477.510.211.4670.80730.0854869FL1
2068.99383.574.417.810.6779.88882.470016GA72311.467.770.
870.913.513823.5709-
0.5653531HI2613.874.740.980.189.1264.7408-
0.0212222IA3262.343.896.680.110.3950.250431.564434ID2822
.93096.779.713.19.557.919851.284314IL96011.4848176.213.61
1.5754.33861.147723IN4897.571.690.675.612.210.8539.8218-
0.2825857KS4966.454.690.981.313.19.9303.47481.074898KY4
636.648.591.864.620.410.6477.4698-
0.0833644LA106220.37566.768.326.414.91360.373-
1.793716MA8053.996.291.18010.710.9719.1340.4875998MD99
812.792.868.978.49.712820.48431.012708ME1261.635.798.578.
810.710.6205.7611-
0.4580154MI7929.882.783.176.815.413974.5974-
1.022228MN3273.469.39482.411.69.9392.0398-
0.3628658MO74411.368.387.673.916.110.9596.18010.8240324
MS43413.530.763.364.324.714.7957.0128-
3.193796MT17832492.68114.910.8214.9012-
0.2136206NC67911.366.375.27014.411.1576.94740.5662267ND
821.741.694.276.711.28.4-
30.50590.6415154NE3393.950.694.381.810.39.4156.45031.028
228NH138259.49882.29.99.2191.7913-
0.3023897NJ6275.310080.876.710.99.6580.28960.2696506NM9
3085687.175.117.413.8906.85190.131264NV87510.484.886.778
.89.812.4812.5840.3557735NY107413.391.777.274.816.412.710
23.0160.2872973OH504681.387.575.71311.4709.3514-

1.144457OK6358.460.182.574.619.911.1625.64940.0531835OR
5034.67093.281.511.811.3586.4274-
0.4646936PA4186.884.888.774.713.29.6501.9544-
0.4756588RI4023.993.692.67211.210.8694.3781-
1.653521SC102310.369.868.668.318.712.3839.26341.029668SD
2083.432.690.277.114.29.484.482480.7058436TN76610.267.78
2.867.119.611.2693.08590.4139353TX76211.983.985.172.117.4
11.8860.463-0.5538287UT3013.177.594.885.110.710453.5657-
0.8545464VA3728.377.577.175.29.710.3475.6079-
0.5816147VT1143.62798.480.81011178.2364-
0.379625WA5155.28389.483.812.111.7746.4708-
1.294313WI2644.468.192.178.612.610.4466.5293-
1.126045WV2086.941.896.36622.29.4297.9507-
0.5456529WY2863.429.795.98313.310.8231.23770.3144581
Sheet2
Sheet3
For question 6 you should use the full dataset with all of the
observations. To compare the observed and fitted values for
those 3 observations you can use the 'list state metro.... if
abs(viores)>2' command on the second page. And then to see if
those observations have unusual explanatory or outcome values
you can use the 'summ' command also on the second page.
For question 7, you first run the 'regr violent metro poverty
snglpar' regression model on the full dataset and extract the
information the question asks for from the output (R^2, root
MSE, coefficient est, se). Then you drop the DC observation
using the command on the 2nd page and rerun the regression
model and extract the needed information from the output. You
repeat the process again dropping FL and MS.
Question#6

Question#7
_cons -1666.436 147.852 -11.27 0.000 -1963.876
-1368.996
snglpar 132.4081 15.50322 8.54 0.000 101.2196
163.5965

poverty 17.68024 6.94093 2.55 0.014 3.716893
31.6436
metro 7.828935 1.254699 6.24 0.000 5.304806
10.35306
violent Coef. Std. Err. t P>|t| [95% Conf.
Interval]
Total 9728474.75 50 194569.495 Root MSE =
182.07
Adj R-squared = 0.8296
Residual 1557994.53 47 33148.8199 R-squared
= 0.8399
Model 8170480.21 3 2723493.4 Prob > F =
0.0000
F(3, 47) = 82.16
Source SS df MS Number of obs =
51
. regr violent metro poverty snglpar
_cons -1197.538 180.4874 -6.64 0.000 -1560.84
-834.2358
snglpar 89.40078 17.83621 5.01 0.000 53.49836
125.3032
poverty 18.28265 6.135958 2.98 0.005 5.931611
30.6337
metro 7.712334 1.109241 6.95 0.000 5.479547
9.94512
Interval]
Total 4289625.22 49 87543.3718 Root MSE =
160.9

= 0.7224
Model 3098767.11 3 1032922.37 Prob > F =
0.0000
F(3, 46) = 39.90
50
(1 observation deleted)
. drop if state=="DC"
_cons -1345.475 158.4841 -8.49 0.000 -1664.879
-1026.071
snglpar 113.4601 15.98783 7.10 0.000 81.23872
145.6814
poverty 17.50214 5.382295 3.25 0.002 6.65484
28.34945
metro 6.099092 .9993893 6.10 0.000 4.084955
8.113229
Interval]
Total 3857922.48 47 82083.457 Root MSE =
135.53
= 0.7905
Model 3049737.19 3 1016579.06 Prob > F =
0.0000
F(3, 44) = 55.35
48
(2 observations deleted)
. drop if abs(viores) > 2

_cons -1666.436 147.852 -11.27 0.000 -1963.876
-1368.996
snglpar 132.4081 15.50322 8.54 0.000 101.2196
163.5965
poverty 17.68024 6.94093 2.55 0.014 3.716893
31.6436
metro 7.828935 1.254699 6.24 0.000 5.304806
10.35306
Interval]
Total 9728474.75 50 194569.495 Root MSE =
182.07
= 0.8399
Model 8170480.21 3 2723493.4 Prob > F =
0.0000
F(3, 47) = 82.16
51
51. DC 100 26.4 22.1 2922 2509.434
3.327972
25. MS 30.7 24.7 14.7 434 957.0128 -
3.193796
9. FL 93 17.8 10.6 1206 779.8888
2.470016
state metro poverty snglpar violent viofit viores
. list state metro poverty snglpar violent viofit viores if
abs(viores) > 2
. predict viores, rstandard

(option xb assumed; fitted values)
. predict viofit
99% 26.4 26.4 Kurtosis 3.330975
95% 24.7 26.4 Skewness .9845979
90% 20 24.7 Variance 21.01527
75% 17.4 22.2
Largest Std. Dev. 4.584242
50% 13.1 Mean 14.25882
25% 10.7 9.7 Sum of Wgt. 51
10% 9.8 9.1 Obs 51
5% 9.1 8.5
1% 8 8
Percentiles Smallest
percent of families in poverty
99% 100 100 Kurtosis 2.044647
95% 96.7 100 Skewness -.413938
90% 93.6 96.7 Variance 482.1157
75% 84 96.2
50% 69.8 Mean 67.3902
25% 48.5 30 Sum of Wgt. 51
10% 32.6 29.7 Obs 51
5% 29.7 27
1% 24 24
percent of pop living in metro
99% 2922 2922 Kurtosis 15.73678
95% 1078 1206 Skewness 2.834805
90% 1023 1078 Variance 194569.5
75% 780 1074
50% 515 Mean 612.8431
25% 326 138 Sum of Wgt. 51
10% 208 126 Obs 51

5% 126 114
1% 82 82
violent crime per 100,000
. summ violent metro poverty snglpar, detail
99% 22.1 22.1 Kurtosis 14.24446
95% 14.7 14.9 Skewness 2.715258
90% 13 14.7 Variance 4.500737
75% 12.1 14.3
50% 10.9 Mean 11.32549
25% 10 9.2 Sum of Wgt. 51
10% 9.4 9.1 Obs 51
5% 9.1 9
1% 8.4 8.4
percent of single-parent famili
99% 26.4 26.4 Kurtosis 3.330975
95% 24.7 26.4 Skewness .9845979
90% 20 24.7 Variance 21.01527
75% 17.4 22.2
50% 13.1 Mean 14.25882
25% 10.7 9.7 Sum of Wgt. 51
10% 9.8 9.1 Obs 51
5% 9.1 8.5
1% 8 8
0

50
0
10
00
15
00
20
00
25
00
F
itt
ed
v
al
ue
s
-500 0 500 1000 1500
Inverse Normal
0
50
0

1,
00
0
1,
50
0
2,
00
0
2,
50
0
F
itt
ed
v
al
ue
s
0
50
0

10
00
15
00
20
00
25
00
F
itt
ed
v
al
ue
s
0 500 1000 1500 2000 2500
Fitted values
viores and viofit
-4
-2
0
2

4
S
ta
nd
ar
di
ze
d
re
si
du
al
s
-4
-2
0
2
4
S
ta
nd

ar
di
ze
d
re
si
du
al
s
-2 -1 0 1 2
Inverse Normal
-4
-2
0
2
4
S
ta
nd
ar
di
ze

d
re
si
du
al
s
0 500 1000 1500 2000 2500
Fitted values
viores and viofit
Question #1
Question #3

. display Ftail(2,45,1.4961)
.23493176
Question #4
Question#5
Question #7

_cons -1795.904 668.7885 -2.69 0.010 -3142.914
-448.8953
snglpar 109.4666 20.35989 5.38 0.000 68.45967
150.4735
poverty 26.24416 11.08327 2.37 0.022 3.921304
48.56702
hsgrad 8.646443 7.826016 1.10 0.275 -7.115962
24.40885
white -4.482907 2.779073 -1.61 0.114 -10.08025
1.114434
metro 7.608808 1.295273 5.87 0.000 4.999995
10.21762
Interval]
Total 9728474.75 50 194569.495 Root MSE =
180.18
= 0.8498
Model 8267618.21 5 1653523.64 Prob > F =
0.0000
F(5, 45) = 50.93
51
. regr violent metro white hsgrad poverty snglpar
_cons -123.6833 170.5113 -0.73 0.472 -466.3387
218.972
metro 10.92928 2.408001 4.54 0.000 6.090222
15.76834

Interval]
Total 9728474.75 50 194569.495 Root MSE =
373.87
= 0.2960
Model 2879416.88 1 2879416.88 Prob > F =
0.0000
F(1, 49) = 20.60
51
. regr violent metro
_cons -1666.436 147.852 -11.27 0.000 -1963.876
-1368.996
snglpar 132.4081 15.50322 8.54 0.000 101.2196
163.5965
poverty 17.68024 6.94093 2.55 0.014 3.716893
31.6436
metro 7.828935 1.254699 6.24 0.000 5.304806
10.35306
Interval]
Total 9728474.75 50 194569.495 Root MSE =
182.07
= 0.8399
Model 8170480.21 3 2723493.4 Prob > F =
0.0000
F(3, 47) = 82.16
51

_cons -1666.436 147.852 -11.27 0.000 -1963.876
-1368.996
snglpar 132.4081 15.50322 8.54 0.000 101.2196
163.5965
poverty 17.68024 6.94093 2.55 0.014 3.716893
31.6436
metro 7.828935 1.254699 6.24 0.000 5.304806
10.35306
Interval]
Total 9728474.75 50 194569.495 Root MSE =
182.07
= 0.8399
Model 8170480.21 3 2723493.4 Prob > F =
0.0000
F(3, 47) = 82.16
51
_cons -1795.904 668.7885 -2.69 0.010 -3142.914
-448.8953
snglpar 109.4666 20.35989 5.38 0.000 68.45967
150.4735
poverty 26.24416 11.08327 2.37 0.022 3.921304
48.56702
hsgrad 8.646443 7.826016 1.10 0.275 -7.115962
24.40885
white -4.482907 2.779073 -1.61 0.114 -10.08025
1.114434

metro 7.608808 1.295273 5.87 0.000 4.999995
10.21762
Interval]
Total 9728474.75 50 194569.495 Root MSE =
180.18
= 0.8498
Model 8267618.21 5 1653523.64 Prob > F =
0.0000
F(5, 45) = 50.93
51
_cons -1666.436 147.852 -11.27 0.000 -1963.876
-1368.996
snglpar 132.4081 15.50322 8.54 0.000 101.2196
163.5965
poverty 17.68024 6.94093 2.55 0.014 3.716893
31.6436
metro 7.828935 1.254699 6.24 0.000 5.304806
10.35306
Interval]
Total 9728474.75 50 194569.495 Root MSE =
182.07
= 0.8399
Model 8170480.21 3 2723493.4 Prob > F =

0.0000
F(3, 47) = 82.16
51
Prob > F = 0.2349
F( 2, 45) = 1.50
( 2) hsgrad = 0
( 1) white = 0
. test white hsgrad
_cons -1795.904 668.7885 -2.69 0.010 -3142.914
-448.8953
snglpar 109.4666 20.35989 5.38 0.000 68.45967
150.4735
poverty 26.24416 11.08327 2.37 0.022 3.921304
48.56702
hsgrad 8.646443 7.826016 1.10 0.275 -7.115962
24.40885
white -4.482907 2.779073 -1.61 0.114 -10.08025
1.114434
metro 7.608808 1.295273 5.87 0.000 4.999995
10.21762
Interval]
Total 9728474.75 50 194569.495 Root MSE =
180.18
= 0.8498
Model 8267618.21 5 1653523.64 Prob > F =
0.0000
F(5, 45) = 50.93

51
_cons 2152.347 832.4773 2.59 0.013 479.4211
3825.273
hsgrad -20.19723 10.89283 -1.85 0.070 -42.08718
1.692727
Interval]
Total 9728474.75 50 194569.495 Root MSE =
430.72
= 0.0656
Model 637824.5 1 637824.5 Prob > F =
0.0697
F(1, 49) = 3.44
51
. regr violent hsgrad
Public Health 141
regression case study 2
This exercise uses the crime data from Agresti and Finlay, from
the Statistical Abstract of the US for a recent year. There are
51 observations, one for each state and the District of
Columbia.
The dataset is crime.dta is in bcourses.
Here is a brief description of the variables:

. desc
Contains data from C:PH142BCRIME.DTA
obs: 51 Agresti and Finlay crime
data
vars: 8 14 Sep 1997 20:55
size: 1,785 (86.0% of memory free)
1. state str3 %9s
2. violent float %9.0g violent crime per
100,000
3. murder float %9.0g murders per 100,000
4. metro float %9.0g percent of pop living in
metro
5. white float %9.0g percent white
6. hsgrad float %9.0g percent high school grad
or mor
7. poverty float %9.0g percent of families in
poverty
8. snglpar float %9.0g percent of singleparent
famili
In class, we are using the poverty rate as an outcome variable;
for this lab, use the violent crime rate as the outcome.
Use the examples in the reader as models for the commands.
Be sure to read all the questions, as there are some stata
commands you need to plan on your own.
Fit the following regression models:
regr violent metro white hsgrad poverty snglpar

regr violent metro poverty white snglpar
regr violent metro poverty snglpar
regr violent poverty
regr violent white
regr violent hsgrad
regr violent metro
regr metro poverty
regr metro snglpar
Continue to explore the association between some of the X
variables:
regr poverty hsgrad
regr poverty white
regr white metro poverty snglpar hsgrad
regr hsgrad metro povery snglpar white
Refit the model
and use Stata's predict command to calculate the fitted values
(call them viofit) and the standardized residuals (call them
viores) so that you can check the model assumptions.

Do the assumption checking for question 1 at this point; before
you drop any observations!
See question 1 to help plan your commands now!
List the observations with large standardized residuals:
list state metro poverty snglpar violent viofit viores if
abs(viores) > 2
And get a summary of the variables in this model to help
explore the outliers
summ violent metro poverty snglpar, detail
Just to see what the influence of these 3 observations are on the
conclusions:
drop if state=="DC"
drop if abs(viores) > 2
Note: Once you have dropped an observation, it’s gone.
You may need to reopen the dataset to do the assumption
checking for the model with all 51 observations.
1. Using the results for all the states and DC, discuss the

assumptions for the model:
Use the residual vs. fitted plot to discuss the functional form
and the assumption of constant variance.
Use the box plot to look for outliers and to assess symmetry,
and the qnorm plot and the Shapiro-Wilk test to discuss the
normality assumption.
No matter what you conclude here, interpret the tests with
caution in the following questions.
Questions 2, 3, 4, and 5 all use the models with DC included.
2. In the full model regr violent metro white hsgrad poverty
snglpar
Interpret the t test for the variable white.
Interpret the t test for the variable poverty.
3. Set up and carry out the restricted vs. full F test to compare
these two models
Be sure to state the hypotheses, show the calculation of the
F statistic from the SS residuals,
give the numerator and denominator degrees of freedom,

use Stata's Ftail function to find the P value, and state your
conclusion in words.
4. Compare your conclusions about the association between
percent high school graduates and violent crime from the
models
regr violent hsgrad
(Note: this question is not asking for a test to compare the 2
models!)
Use the regression of hsgrad on the other predictor variables
metro white poverty snglpar to explain why these models lead to
different conclusions about the association between pecent
hsgrad and violent crime. (This is an example of collinearity.)
5. Take a look at the models
regr violent metro
Metro is significant in both models
Verify that the differences in the point estimate, standard error,
and confidence interval
for metro are relatively large. This is an example of
confounding.
For this to happen, snglpar and/or poverty
must be associated both with metro and with violent.

Check that this is the case.
6. Compare the fitted and observed values for the District of
Columbia (DC), Mississippi (MS) and Florida (FL) for the
model using all the observations. Do these states have unusual
values on the X variables? on the outcome variable?
7. Make a table of the estimated coefficients and standard errors
for the model
for all 51 observations, for the 50 states with DC dropped,
for the 48 states with the 2 outliers and DC dropped.
Also make a table of the R2 values and the root MSEs for the
3 models and compare them.
Which coefficients are sensitive to the points that did not fit
well, and which are not?
(That is, which variables have coefficient estimates that are
similar for all 3 sets of states,
and which variables have coefficient estimates that are
different?)
What changes do you see in the standard errors?
(Notice that with DC omitted, the SS total is much smaller,
which is why the R2 value is actually smaller for the model
with DC dropped.)

Sheet 1: Find the following regression models:
desc
regr violent white
regr violent hsgrad
regr violent metro
regr violent snglpar
regr metro poverty
regr metro snglpar
regr poverty hsgrad
regr poverty white
regr hsgrad metro poverty snglpar white

Sheet 2: Refit the model
predict viofit
predict viores, rstandard
scatter viores viofit, title(viores and viofit) yline(0)
swilk viores
abs(viores) > 2
drop if state=="DC"

_cons 2152.347 832.4773 2.59 0.013 479.4211
3825.273

hsgrad -20.19723 10.89283 -1.85 0.070 -42.08718
1.692727
Interval]
Total 9728474.75 50 194569.495 Root MSE =
430.72
= 0.0656
Model 637824.5 1 637824.5 Prob > F =
0.0697
F(1, 49) = 3.44
51
. regr violent hsgrad
_cons -123.6833 170.5113 -0.73 0.472 -466.3387
218.972
metro 10.92928 2.408001 4.54 0.000 6.090222
15.76834
Interval]
Total 9728474.75 50 194569.495 Root MSE =
373.87
= 0.2960
Model 2879416.88 1 2879416.88 Prob > F =
0.0000
F(1, 49) = 20.60
51

. regr violent metro
_cons -1362.532 186.2331 -7.32 0.000 -1736.782
-988.2831
snglpar 174.4186 16.16796 10.79 0.000 141.9278
206.9093
Interval]
Total 9728474.75 50 194569.495 Root MSE =
242.54
Residual 2882441.1 49 58825.3286 R-squared =
0.7037
Model 6846033.64 1 6846033.64 Prob > F =
0.0000
F(1, 49) = 116.38
51
. regr violent snglpar
_cons 71.5247 10.22013 7.00 0.000 50.98657
92.06283
poverty -.2899612 .6829876 -0.42 0.673 -1.662476
1.082554
metro Coef. Std. Err. t P>|t| [95% Conf.
Interval]
Total 24105.7848 50 482.115696 Root MSE =
22.139
Adj R-squared = -0.0167
= 0.0037
Model 88.3455837 1 88.3455837 Prob > F =

0.6730
F(1, 49) = 0.18
51
. regr metro poverty
_cons 36.93602 16.44604 2.25 0.029 3.886467
69.98557
snglpar 2.688994 1.427775 1.88 0.066 -.1802279
5.558216
metro Coef. Std. Err. t P>|t| [95% Conf.
Interval]
Total 24105.7848 50 482.115696 Root MSE =
21.418
= 0.0675
Model 1627.17159 1 1627.17159 Prob > F =
0.0656
F(1, 49) = 3.55
51
. regr metro snglpar
_cons 60.74454 5.980884 10.16 0.000 48.7255
72.76358
hsgrad -.6098605 .0782589 -7.79 0.000 -.7671275
-.4525934
poverty Coef. Std. Err. t P>|t| [95% Conf.
Interval]
Total 1050.76354 50 21.0152707 Root MSE =
3.0945

= 0.5534
Model 581.538845 1 581.538845 Prob > F =
0.0000
F(1, 49) = 60.73
51
. regr poverty hsgrad
_cons 25.58013 3.874949 6.60 0.000 17.79313
33.36713
white -.1346047 .0455205 -2.96 0.005 -.2260816
-.0431278
poverty Coef. Std. Err. t P>|t| [95% Conf.
Interval]
Total 1050.76354 50 21.0152707 Root MSE =
4.2658
= 0.1514
Model 159.112608 1 159.112608 Prob > F =
0.0048
F(1, 49) = 8.74
51
. regr poverty white
_cons 59.1182 34.39484 1.72 0.092 -10.11502
128.3514
hsgrad .8948473 .3936838 2.27 0.028 .102403
1.687292
snglpar -4.215963 .8833982 -4.77 0.000 -5.994151
-2.437774
poverty .7320098 .578026 1.27 0.212 -.4314962

1.895516
metro -.0876765 .067493 -1.30 0.200 -.2235329
.0481799
white Coef. Std. Err. t P>|t| [95% Conf.
Interval]
Total 8781.81687 50 175.636337 Root MSE =
9.5591
0.5214
Model 4578.4722 4 1144.61805 Prob > F =
0.0000
F(4, 46) = 12.53
51
. regr white metro poverty snglpar hsgrad
_cons 69.84722 7.259594 9.62 0.000 55.23442
84.46003
white .1128412 .0496439 2.27 0.028 .0129131
.2127692
snglpar 1.259689 .3356152 3.75 0.000 .5841301
1.935247
poverty -1.107212 .1301947 -8.50 0.000 -1.36928
-.8451434
metro -.023647 .0241526 -0.98 0.333 -.0722636
.0249696
hsgrad Coef. Std. Err. t P>|t| [95% Conf.
Interval]
Total 1563.57162 50 31.2714324 Root MSE =
3.3945

= 0.6610
Model 1033.52551 4 258.381379 Prob > F =
0.0000
F(4, 46) = 22.42
51
. regr hsgrad metro poverty snglpar white
_cons -1666.436 147.852 -11.27 0.000 -1963.876
-1368.996
snglpar 132.4081 15.50322 8.54 0.000 101.2196
163.5965
poverty 17.68024 6.94093 2.55 0.014 3.716893
31.6436
metro 7.828935 1.254699 6.24 0.000 5.304806
10.35306
Interval]
Total 9728474.75 50 194569.495 Root MSE =
182.07
= 0.8399
Model 8170480.21 3 2723493.4 Prob > F =
0.0000
F(3, 47) = 82.16
51
viores 51 0.96859 1.500 0.866 0.19324
Variable Obs W V z Prob>z
Shapiro-Wilk W test for normal data

. swilk viores
51. DC 100 26.4 22.1 2922 2509.434
3.327972
25. MS 30.7 24.7 14.7 434 957.0128 -
3.193796
9. FL 93 17.8 10.6 1206 779.8888
2.470016
abs(viores) > 2
99% 22.1 22.1 Kurtosis 14.24446
95% 14.7 14.9 Skewness 2.715258
90% 13 14.7 Variance 4.500737
75% 12.1 14.3
50% 10.9 Mean 11.32549
25% 10 9.2 Sum of Wgt. 51
10% 9.4 9.1 Obs 51
5% 9.1 9
1% 8.4 8.4
99% 26.4 26.4 Kurtosis 3.330975
95% 24.7 26.4 Skewness .9845979
90% 20 24.7 Variance 21.01527
75% 17.4 22.2
50% 13.1 Mean 14.25882
25% 10.7 9.7 Sum of Wgt. 51
10% 9.8 9.1 Obs 51
5% 9.1 8.5
1% 8 8

_cons -1197.538 180.4874 -6.64 0.000 -1560.84
-834.2358
snglpar 89.40078 17.83621 5.01 0.000 53.49836
125.3032
poverty 18.28265 6.135958 2.98 0.005 5.931611
30.6337
metro 7.712334 1.109241 6.95 0.000 5.479547
9.94512
Interval]
Total 4289625.22 49 87543.3718 Root MSE =
160.9
= 0.7224
Model 3098767.11 3 1032922.37 Prob > F =
0.0000
F(3, 46) = 39.90
50
99% 26.4 26.4 Kurtosis 3.330975
95% 24.7 26.4 Skewness .9845979
90% 20 24.7 Variance 21.01527
75% 17.4 22.2
50% 13.1 Mean 14.25882
25% 10.7 9.7 Sum of Wgt. 51

10% 9.8 9.1 Obs 51
5% 9.1 8.5
1% 8 8
99% 100 100 Kurtosis 2.044647
95% 96.7 100 Skewness -.413938
90% 93.6 96.7 Variance 482.1157
75% 84 96.2
50% 69.8 Mean 67.3902
25% 48.5 30 Sum of Wgt. 51
10% 32.6 29.7 Obs 51
5% 29.7 27
1% 24 24
percent of pop living in metro
99% 2922 2922 Kurtosis 15.73678
95% 1078 1206 Skewness 2.834805
90% 1023 1078 Variance 194569.5
75% 780 1074
50% 515 Mean 612.8431
25% 326 138 Sum of Wgt. 51
10% 208 126 Obs 51
5% 126 114
1% 82 82
violent crime per 100,000
. summ violent metro poverty snglpar, detail
99% 22.1 22.1 Kurtosis 14.24446
95% 14.7 14.9 Skewness 2.715258
90% 13 14.7 Variance 4.500737

75% 12.1 14.3
50% 10.9 Mean 11.32549
25% 10 9.2 Sum of Wgt. 51
10% 9.4 9.1 Obs 51
5% 9.1 9
1% 8.4 8.4
_cons -1197.538 180.4874 -6.64 0.000 -1560.84
-834.2358
snglpar 89.40078 17.83621 5.01 0.000 53.49836
125.3032
poverty 18.28265 6.135958 2.98 0.005 5.931611
30.6337
metro 7.712334 1.109241 6.95 0.000 5.479547
9.94512
Interval]
Total 4289625.22 49 87543.3718 Root MSE =
160.9
= 0.7224
Model 3098767.11 3 1032922.37 Prob > F =
0.0000
F(3, 46) = 39.90
50

_cons -1345.475 158.4841 -8.49 0.000 -1664.879
-1026.071
snglpar 113.4601 15.98783 7.10 0.000 81.23872
145.6814
poverty 17.50214 5.382295 3.25 0.002 6.65484
28.34945
metro 6.099092 .9993893 6.10 0.000 4.084955
8.113229
Interval]
Total 3857922.48 47 82083.457 Root MSE =
135.53
= 0.7905
Model 3049737.19 3 1016579.06 Prob > F =
0.0000
F(3, 44) = 55.35
48
35. OH 81.3 13 11.4 504 709.3514
709.3514
34. NY 91.7 16.4 12.7 1074 1023.016
1023.016
33. NV 84.8 9.8 12.4 875 812.584 812.584
32. NM 56 17.4 13.8 930 906.8519
906.8519
31. NJ 100 10.9 9.6 627 580.2896
580.2896

30. NH 59.4 9.9 9.2 138 191.7913
191.7913
29. NE 50.6 10.3 9.4 339 156.4503
156.4503
28. ND 41.6 11.2 8.4 82 -30.5059 -30.5059
27. NC 66.3 14.4 11.1 679 576.9474
576.9474
26. MT 24 14.9 10.8 178 214.9012
214.9012
25. MS 30.7 24.7 14.7 434 957.0128
957.0128
24. MO 68.3 16.1 10.9 744 596.1801
596.1801
23. MN 69.3 11.6 9.9 327 392.0398
392.0398
22. MI 82.7 15.4 13 792 974.5974
974.5974
21. ME 35.7 10.7 10.6 126 205.7611
205.7611
20. MD 92.8 9.7 12 998 820.4843
820.4843
19. MA 96.2 10.7 10.9 805 719.134
719.134
18. LA 75 26.4 14.9 1062 1360.373
1360.373
17. KY 48.5 20.4 10.6 463 477.4698
477.4698
16. KS 54.6 13.1 9.9 496 303.4748
303.4748
15. IN 71.6 12.2 10.8 489 539.8218
539.8218
14. IL 84 13.6 11.5 960 754.3386 754.3386

13. ID 30 13.1 9.5 282 57.91985 57.91985
12. IA 43.8 10.3 9 326 50.25043 50.25043
11. HI 74.7 8 9.1 261 264.7408 264.7408
10. GA 67.7 13.5 13 723 823.5709
823.5709
9. FL 93 17.8 10.6 1206 779.8888
779.8888
8. DE 82.7 10.2 11.4 686 670.8073
670.8073
7. CT 95.7 8.5 10.1 456 570.3966 570.3966
6. CO 81.8 9.9 12.1 567 751.1429
751.1429
5. CA 96.7 18.2 12.5 1078 1067.503
1067.503
4. AZ 84.7 15.4 12.1 715 871.0881
871.0881
3. AR 44.7 20 10.7 593 453.8885
453.8885
2. AL 67.4 17.4 11.5 780 691.5632
691.5632
1. AK 41.8 9.1 14.3 761 715.139 715.139
abs(viores) > 2
51. DC 100 26.4 22.1 2922 2509.434
2509.434
50. WY 29.7 13.3 10.8 286 231.2377
231.2377
49. WV 41.8 22.2 9.4 208 297.9507
297.9507

48. WI 68.1 12.6 10.4 264 466.5293
466.5293
47. WA 83 12.1 11.7 515 746.4708
746.4708
46. VT 27 10 11 114 178.2364 178.2364
45. VA 77.5 9.7 10.3 372 475.6079
475.6079
44. UT 77.5 10.7 10 301 453.5657
453.5657
43. TX 83.9 17.4 11.8 762 860.463
860.463
42. TN 67.7 19.6 11.2 766 693.0859
693.0859
41. SD 32.6 14.2 9.4 208 84.48248
84.48248
40. SC 69.8 18.7 12.3 1023 839.2634
839.2634
39. RI 93.6 11.2 10.8 402 694.3781
694.3781
38. PA 84.8 13.2 9.6 418 501.9544
501.9544
37. OR 70 11.8 11.3 503 586.4274
586.4274
36. OK 60.1 19.9 11.1 635 625.6494
625.6494
Sorted by:
viores float %9.0g Fitted values
viofit float %9.0g Fitted values
viores_res float %9.0g Standardized residuals
viofit_fit float %9.0g Fitted values
snglpar float %9.0g percent of single-parent
famili
poverty float %9.0g percent of families in

poverty
hsgrad float %9.0g percent high school grad
or mor
white float %9.0g percent white
metro float %9.0g percent of pop living in
metro
murder float %9.0g murders per 100,000
violent float %9.0g violent crime per 100,000
state str3 %9s
variable name type format label variable label
storage display value
size: 2,397
vars: 12 6 Aug 2015 11:03
obs: 51 Agresti and Finlay crime data
Contains data from E:Regresstion Case Study 2STATA.dta
. desc
_cons -1795.904 668.7885 -2.69 0.010 -3142.914
-448.8953
snglpar 109.4666 20.35989 5.38 0.000 68.45967
150.4735
poverty 26.24416 11.08327 2.37 0.022 3.921304
48.56702
hsgrad 8.646443 7.826016 1.10 0.275 -7.115962
24.40885
white -4.482907 2.779073 -1.61 0.114 -10.08025
1.114434
metro 7.608808 1.295273 5.87 0.000 4.999995
10.21762
Interval]
Total 9728474.75 50 194569.495 Root MSE =
180.18

= 0.8498
Model 8267618.21 5 1653523.64 Prob > F =
0.0000
F(5, 45) = 50.93
51
_cons -1191.974 386.2523 -3.09 0.003 -1969.46
-414.4888
snglpar 120.3584 17.85667 6.74 0.000 84.41482
156.302
white -3.507233 2.641344 -1.33 0.191 -8.823982
1.809516
poverty 16.67072 6.927109 2.41 0.020 2.727174
30.61427
metro 7.404345 1.285055 5.76 0.000 4.817663
9.991027
Interval]
Total 9728474.75 50 194569.495 Root MSE =
180.61
0.8458
Model 8227991.45 4 2056997.86 Prob > F =
0.0000
F(4, 46) = 63.06
51
. regr violent metro poverty white snglpar
_cons -1666.436 147.852 -11.27 0.000 -1963.876

-1368.996
snglpar 132.4081 15.50322 8.54 0.000 101.2196
163.5965
poverty 17.68024 6.94093 2.55 0.014 3.716893
31.6436
metro 7.828935 1.254699 6.24 0.000 5.304806
10.35306
Interval]
Total 9728474.75 50 194569.495 Root MSE =
182.07
= 0.8399
Model 8170480.21 3 2723493.4 Prob > F =
0.0000
F(3, 47) = 82.16
51
_cons -86.20093 176.9902 -0.49 0.628 -441.8761
269.4743
poverty 49.02537 11.82784 4.14 0.000 25.25643
72.79431
Interval]
Total 9728474.75 50 194569.495 Root MSE =
383.41
= 0.2596

Model 2525496.21 1 2525496.21 Prob > F =
0.0001
F(1, 49) = 17.18
51
. regr violent poverty
_cons 2508.917 297.7758 8.43 0.000 1910.514
3107.32
white -22.54337 3.498087 -6.44 0.000 -29.57304
-15.5137
Interval]
Total 9728474.75 50 194569.495 Root MSE =
327.81
= 0.4588
Model 4462950.06 1 4462950.06 Prob > F =
0.0000
F(1, 49) = 41.53
51
. regr violent white
ph 141
regression case study 2
This exercise uses the crime data from Agresti and Finlay, from
the Statistical Abstract of the US for a recent year. There are
51 observations, one for each state and the District of
Columbia.

The dataset is crime.dta is in bcourses.
Here is a brief description of the variables:
. desc
Contains data from C:PH142BCRIME.DTA
obs: 51 Agresti and Finlay crime
data
vars: 8 14 Sep 1997 20:55
size: 1,785 (86.0% of memory free)
1. state str3 %9s
2. violent float %9.0g violent crime per
100,000
3. murder float %9.0g murders per 100,000
4. metro float %9.0g percent of pop living in
metro
5. white float %9.0g percent white
6. hsgrad float %9.0g percent high school grad
or mor
7. poverty float %9.0g percent of families in
poverty
8. snglpar float %9.0g percent of singleparent
famili
In class, we are using the poverty rate as an outcome variable;
for this lab, use the violent crime rate as the outcome.
Use the examples in the reader as models for the commands.
Be sure to read all the questions, as there are some stata
commands you need to plan on your own.

Fit the following regression models:
regr violent white
regr violent hsgrad
regr violent metro
regr metro poverty
regr metro snglpar
Continue to explore the association between some of the X
variables:
regr poverty hsgrad
regr poverty white
regr hsgrad metro povery snglpar white
Refit the model

and use Stata's predict command to calculate the fitted values
(call them viofit) and the standardized residuals (call them
viores) so that you can check the model assumptions.
Do the assumption checking for question 1 at this point; before
you drop any observations!
See question 1 to help plan your commands now!
List the observations with large standardized residuals:
abs(viores) > 2
And get a summary of the variables in this model to help
explore the outliers
Just to see what the influence of these 3 observations are on the
conclusions:
drop if state=="DC"
drop if abs(viores) > 2
Note: Once you have dropped an observation, it’s gone.
You may need to reopen the dataset to do the assumption

checking for the model with all 51 observations.
Question 1. Using the results for all the states and DC, discuss
the assumptions for the model:
Use the residual vs. fitted plot to discuss the functional form
and the assumption of constant variance.
Use the box plot to look for outliers and to assess symmetry,
and the qnorm plot and the Shapiro-Wilk test to discuss the
normality assumption.
No matter what you conclude here, interpret the tests with
caution in the following questions.
Questions 2, 3, 4, and 5 all use the models with DC included.
Question 2. In the full model regr violent metro white hsgrad
poverty snglpar
Interpret the t test for the variable white.
Interpret the t test for the variable poverty.
Question 3. Set up and carry out the restricted vs. full F test to
compare these two models
Be sure to state the hypotheses, show the calculation of the
F statistic from the SS residuals,

give the numerator and denominator degrees of freedom,
use Stata's Ftail function to find the P value, and state your
conclusion in words.
Question 4. Compare your conclusions about the association
between percent high school graduates and violent crime from
the models
regr violent hsgrad
(Note: this question is not asking for a test to compare the 2
models!)
Use the regression of hsgrad on the other predictor variables
metro white poverty snglpar to explain why these models lead to
different conclusions about the association between pecent
hsgrad and violent crime. (This is an example of collinearity.)
Question 5. Take a look at the models
regr violent metro
Metro is significant in both models
Verify that the differences in the point estimate, standard error,
and confidence interval
for metro are relatively large. This is an example of
confounding.
For this to happen, snglpar and/or poverty
must be associated both with metro and with violent.

Check that this is the case.
Question 6. Compare the fitted and observed values for the
District of Columbia (DC), Mississippi (MS) and Florida (FL)
for the model using all the observations. Do these states have
unusual values on the X variables? on the outcome variable?
Question 7. Make a table of the estimated coefficients and
standard errors for the model
for all 51 observations, for the 50 states with DC dropped,
for the 48 states with the 2 outliers and DC dropped.
Also make a table of the R2 values and the root MSEs for the
3 models and compare them.
Which coefficients are sensitive to the points that did not fit
well, and which are not?
(That is, which variables have coefficient estimates that are
similar for all 3 sets of states,
and which variables have coefficient estimates that are
different?)
What changes do you see in the standard errors?
(Notice that with DC omitted, the SS total is much smaller,
which is why the R2 value is actually smaller for the model
with DC dropped.)

Sheet1stateviolentmurdermetrowhitehsgradpovertysnglparviofitviores.docx

Sheet1stateviolentmurdermetrowhitehsgradpovertysnglparviofitviores.docx

Recommended

Recommended

More Related Content

Similar to Sheet1stateviolentmurdermetrowhitehsgradpovertysnglparviofitviores.docx

Similar to Sheet1stateviolentmurdermetrowhitehsgradpovertysnglparviofitviores.docx (20)

More from lesleyryder69361

More from lesleyryder69361 (20)

Recently uploaded

Recently uploaded (20)

Sheet1stateviolentmurdermetrowhitehsgradpovertysnglparviofitviores.docx