Analysis of Dangerous by Design

  • 57 views
Uploaded on

Addressing the argument that Florida is unfairly represented in the Dangerous by Design reports due to the census misrepresenting actual walking rates.

Addressing the argument that Florida is unfairly represented in the Dangerous by Design reports due to the census misrepresenting actual walking rates.

  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Be the first to comment
    Be the first to like this
No Downloads

Views

Total Views
57
On Slideshare
0
From Embeds
0
Number of Embeds
1

Actions

Shares
Downloads
0
Comments
0
Likes
0

Embeds 0

No embeds

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
    No notes for slide

Transcript

  • 1. HYPOTHESIS TESTING John-Mark Palacios URP6200 – Planimetrics Dr. Prosperi
  • 2. TABLE OF CONTENTS Page I. INTRODUCTION 1 II. ANALYSIS 1 III. T- TEST 2 IV. F – TEST 3 V. SCATTERPLOT / CORRELATION / REGRESSION 6 VI. CONCLUSIONS 11
  • 3. HYPOTHESIS TESTING Page 1 I. INTRODUCTION The Dangerous by Design report, which highlighted cities across the United States that were the most dangerous for pedestrians, rocked the state of Florida by ranking its four largest metropolitan areas as the top 4 cities for pedestrian fatalities. Analytical readers of this report might wonder whether the state was doing such a horrible job at protecting pedestrians, or if there are other factors such as demographics at play. Besides the number of pedestrians killed and the population, this report only looked at the American Community Survey's Journey to Work data as a proxy for pedestrians presence on the roadways. While few people walk to work, many more people walk for leisure or to run errands. We pulled in data from other sourcesto see how walkability, density, and non-work walking trips compare and correlate with the pedestrian fatality data. Walkscore.com is known for its ability to calculate a score for individual addresses, but it has performed limited analyses of larger areas such as neighborhoods and cities. The Dangerous by Design report usedmetropolitan areas as the cases. Since Walkscore did not respond to our request for a Walkscore for metropolitan areas, we used their listing of walkscore by cities. The largest city in the metropolitan area was used to represent the walkscore for the metropolitan area. While this may not be entirely representative of the overall area, most American cities follow similar patterns of development and any discrepancy might be expected to be consistent between cities. Included on Walkscore's list of cities is a top ten list of most walkable cities, so we used this to create a dichotomous variable of whether a city was on this list. In order to provide a variable that encompassed the larger metropolitan area, we used population density obtained from the 2010 census for urbanized areas. While the pedestrian fatality report may have been performed at the metropolitan statistical area level, this uses county lines as the boundaries and generally includes vast swaths of rural lands. Density is more accurately measured within the contiguous urbanized area, excluding the low-density census blocks that go from suburban to rural. The Centers for Disease Control collected data on physical activity for various metropolitan areas across the country. While this data is not as extensive as the American Community Survey's, it does provide a more absolute measure of people walking in these areas. This data was available for fewer metropolitan areas, so a subset of the cases was used when comparing with this physical activity data. II. ANALYSIS The original Dangerous by Design report included the following variables: 1. Ranking, based on Pedestrian Danger Index 2. Metro Area 3. Total Pedestrian Deaths between 2000 and 2009
  • 4. HYPOTHESIS TESTING Page 2 4. Average annual pedestrian deaths per 100,000 residents (2000-2009) 5. Percent of workers walking to work between 2005 and 2009 6. Pedestrian Danger Index (PDI), calculated with the above variables, #4 x 100,000 / #5 We have added the following: 7. Walkscore for central city, a value between 0 and 100 8. Population Density of the urbanized area 9. Whether central city is on the "Most Walkable" list (yes/no represented as 1/0) 10. Name of Central City 11. State of Central City The above data had 52 cases. The physical activity data with a subset of cases, for only 36 urban areas out of the 52, included the following variables: 12. Percent of respondentswho reported walking as one of the two most frequent leisure time activities they participatedin within the past month. 13. Percent of respondents walking at least 5 times a week, 30 minutes per session, within the past month. The CDC physical activity was based on surveys with at least 500 respondents. The following tests were conducted to analyze these variables: T-test looking for significance between the central city being on the "Most Walkable" list and the average annual pedestrian deaths F-test looking at differences in annual pedestrian deaths by state of central city F-test looking at differences in any walking by state of central city Scatterplot and correlation coefficient between annual pedestrian deaths and walkscore Correlation coefficients among PDI, walkscore, annual pedestrian deaths, population density, percent walking any time, and percent walking 5 times a week Scatterplot for walkscore vs. annual pedestrian deaths Scatterplot for percent of people walking any time vs. annual pedestrian deaths Regression analysis using those variables with the highest correlation coefficients III. T - TESTS - Tests of difference "Most Walkable" List and the Average Annual Pedestrian Deaths Per 100,000 1. Ho: There is no difference between the average annual pedestrian deaths per 100,000 for those cities on the "Most Walkable" cities list and those not on the list.
  • 5. HYPOTHESIS TESTING Page 3 H1: There is a difference between the average annual pedestrian deaths for those cities on the "Most Walkable" cities list and those not on the list. 2. t, w/ 50df. 3. .05 4. t = -0.376, Pr(t) = .708 5. Since pr(t) is > .05, we fail to reject the null hypothesis. * The average annual deathsper 100,000 for those cities on the "Most Walkable" list is1.59, compared to 1.67 for those off the list. The difference is very small and there is no evidence to show that it is significant. T test on list not on list Mean 1.588888889 1.672093023 Variance 0.361111111 0.364916944 Observations 9 43 Pooled Variance 0.36430801 Hypothesized Mean Difference 0 Observed Mean Difference -0.083204134 Df 50 t Stat -0.376066248 P (T<=t) one-tail 0.354229321 t Critical one-tail 1.675905025 P (T<=t) two-tail 0.708458641 t Critical two-tail 2.008559112 Table 1. T-test. IV. F - TEST - Test of difference Test 1: Annual Pedestrian Deaths among cities in different states 1. Ho: There is no difference in the average number of annual pedestrian deaths among the cities of different states. H1: At least one state's average number of annual pedestrian deaths is different among all the states. 2. F, with (30) + (21) df. 3. .05 4. F = 6.141; pr (F) = .0000306
  • 6. HYPOTHESIS TESTING Page 4 5. Since pr(F) is < .05,we reject the null hypothesis and accept the alternative hypothesis. * At least one of the means is different. Inspection of the averages shows that Florida has a very high number of pedestrian deaths, averaging over 3 deaths per year per 100,000 people. Table 2. Anova: Single Factor (Test 1) SUMMARY Groups Count Sum Average Variance AL 1 1.2 1.2 #DIV/0! AZ 2 4.6 2.3 0 CA 6 11.7 1.95 0.115 CO 1 1.6 1.6 #DIV/0! CT 1 1.2 1.2 #DIV/0! DC 1 1.7 1.7 #DIV/0! FL 4 12.2 3.05 0.096666667 GA 1 1.6 1.6 #DIV/0! IL 1 1.4 1.4 #DIV/0! IN 1 1.1 1.1 #DIV/0! KY 1 1.6 1.6 #DIV/0! LA 1 2.4 2.4 #DIV/0! MA 1 1.1 1.1 #DIV/0! MD 1 1.8 1.8 #DIV/0! MI 1 1.8 1.8 #DIV/0! MN 1 0.8 0.8 #DIV/0! MO 2 2.6 1.3 0.02 NC 2 3.1 1.55 0.045 NV 1 2.5 2.5 #DIV/0! NY 3 4.5 1.5 0.13 OH 3 2.5 0.833333333 0.023333333 OK 1 1.4 1.4 #DIV/0! OR 1 1.2 1.2 #DIV/0! PA 2 2.8 1.4 0.18 RI 1 1.2 1.2 #DIV/0! TN 2 3.5 1.75 0.245 TX 4 7.1 1.775 0.0425 UT 1 1.3 1.3 #DIV/0! VA 2 2.4 1.2 0.08 WA 1 1.2 1.2 #DIV/0! WI 1 1.1 1.1 #DIV/0!
  • 7. HYPOTHESIS TESTING Page 5 ANOVA Source of Variation SS df MS F P-value F critical Between Groups 16.3977564 1 30 0.54659188 6.140934188 0.00003058 2.0102483 Within Groups 1.86916666 7 21 0.089007937 Total 18.2669230 8 51 Test 2: Differences in Walking for Physical Activity by State 1. Ho: There is no difference in the average percentage of people walking among the cities of different states. H1: At least one state's average percentage of people walking is different among all the states. 2. F, with (28) + (7) df. 3. .05 4. F = 0.927; pr (F) = ..597 5. Since pr(F) is > .05,we fail to reject the null hypothesis. * There is no evidence to show that there is a difference in the average percentage of people walking among different states.
  • 8. HYPOTHESIS TESTING Page 6 Table 3. Anova: Single Factor (Test 2) SUMMARY Groups Count Sum Average Variance AZ 2 70.7 35.35 10.125 CA 1 38.5 38.5 #DIV/0! CO 1 42.1 42.1 #DIV/0! CT 1 40.1 40.1 #DIV/0! DC 1 40.2 40.2 #DIV/0! FL 3 101 33.66666667 40.82333333 GA 1 41 41 #DIV/0! IL 1 36.3 36.3 #DIV/0! IN 1 41.9 41.9 #DIV/0! KY 1 36.3 36.3 #DIV/0! LA 1 32.9 32.9 #DIV/0! MA 1 41.8 41.8 #DIV/0! MD 1 39.3 39.3 #DIV/0! MI 1 40.6 40.6 #DIV/0! MN 1 37.7 37.7 #DIV/0! MO 2 75.9 37.95 0.005 NC 1 41 41 #DIV/0! NV 1 37.5 37.5 #DIV/0! NY 1 37.8 37.8 #DIV/0! OH 1 41.4 41.4 #DIV/0! OK 1 38.5 38.5 #DIV/0! OR 1 45.1 45.1 #DIV/0! PA 2 87.9 43.95 0.125 RI 1 40.1 40.1 #DIV/0! TN 2 79.9 39.95 25.205 TX 2 78.1 39.05 4.805 UT 1 45.1 45.1 #DIV/0! WA 1 48.5 48.5 #DIV/0! WI 1 44.2 44.2 #DIV/0! ANOVA Source of Variation SS df MS F P-value F critical Between Groups 452.0983333 28 16.14636905 0.927102274 0.597200543 3.385786974 Within Groups 121.9116667 7 17.41595238 Total 574.01 35
  • 9. HYPOTHESIS TESTING Page 7 V. SCATTERPLOT/CORRELATION/ REGRESSION - Test of relationship Scatterplot – Graph Figure 1. Plot of Walkscore (X axis) vs. Annual Deaths per 100,000 (Y axis) Since Figure 1 tilts slightly down to the right, it appears that deaths decrease as walkability increases. Figure 2. Plot of %Walking (X axis) vs. Annual Deaths per 100,000 (Y axis) 0 0.5 1 1.5 2 2.5 3 3.5 4 0 20 40 60 80 100 Walkscore for central city Walkscore for central city 0 10 20 30 40 50 60 0 1 2 3 4 % any walking in the past month % any walking in the past month
  • 10. HYPOTHESIS TESTING Page 8 Figure 2 tilts downward to the right, implying that there is a tendency for deaths to increase as walking decreases. CORRELATION: Correlations Avg. annual pedestrian deaths per 100,000 (2000-- 2009) Percent of workers walking to work (2005-- 2009) PDI Walkscore for Central City Population Density Avg. annual pedestrian deaths per 100,000 (2000--2009) 1 Percent of workers walking to work (2005--2009) -0.224210565 1 PDI 0.820076533 -0.653196727 1 Walkscore for central city -0.167115655 0.774097237 - 0.51964851 5 1 Population Density 0.280170762 0.383542089 - 0.08963102 7 0.44331397 8 1 N=52 50 df R value required for a two-tailed test with 0.05 significance 0.273 Table 4. Correlation among variables. Correla tions Total pedestrian deaths (2000-- 2009) Avg. annual pedestrian deaths per 100,000 (2000-- 2009) Percent of workers walking to work (2005-- 2009) PDI Walkscore for central city Population Density % any walking in the past month % walk at least 5 times per week, 30 min. Total pedestr ian deaths (2000-- 2009) 1 Avg. annual pedestr ian deaths per 100,00 0 (2000-- 2009) 0.33079027 1 Percent of worker s 0.433803879 - 0.21869350 2 1
  • 11. HYPOTHESIS TESTING Page 9 Correla tions Total pedestrian deaths (2000-- 2009) Avg. annual pedestrian deaths per 100,000 (2000-- 2009) Percent of workers walking to work (2005-- 2009) PDI Walkscore for central city Population Density % any walking in the past month % walk at least 5 times per week, 30 min. walking to work (2005-- 2009) PDI 0.041281599 0.82361424 4 - 0.670029 555 1 Walksc ore for central city 0.427692706 - 0.19742731 8 0.768320 164 - 0.58 5401 773 1 Populat ion Density 0.739550019 0.32113931 6 0.321555 001 - 0.02 5273 485 0.4222473 9 1 % any walking in the past month -0.300887411 - 0.61718871 5 0.276923 962 - 0.56 1315 22 0.1202255 71 - 0.2441474 74 1 % walk at least 5 times per week, 30 min. -0.001619811 - 0.15241793 1 0.475813 507 - 0.37 4771 483 0.4391754 16 0.0344735 05 0.45808 3911 1 N=36 34 df R value required for a two-tailed test with 0.05 significance 0.33 Table 5. Correlation among variables in the subset of the data. REGRESSION: The Walkscore variable was removed from the final regression model because the p- value was 0.47, greater than 0.05. Table 6 shows one regression model that endeavors to account for pedestrian deaths. It takes the form Annual Deaths = 1.566 -23.394 x (% walk to work) + 0.000217 x (Population Density). Note that the r-value is rather low for this model, however, implying that it does not account well for the variability. SUMMARY OUTPUT Response Variable Avg. annual pedestrian deaths per 100,000 (2000-- 2009) Regression Statistics Multiple R 0.455491226 R^2 0.207472257 Standard Error 0.543553 Adjusted R^2 0.175124186
  • 12. HYPOTHESIS TESTING Page 10 Observations 52 ANOVA df SS MS F Significance of F Regression 2 3.7898797 65 1.894939 882 6.4137 44314 0.003356252 Residual 49 14.477043 31 0.295449 864 Total 51 18.266923 08 Coefficients Standard Error t-Statistics p- Value Lower 95% Intercept 1.566014419 0.2334928 1 6.706906 403 1.89E- 08 1.096793051 Upper 95% Percent of workers walking to work (2005--2009) - 23.39400507 8.2843475 83 - 2.823880 196 0.0068 41321 -40.04202483 2.03523578 7 Population Density 0.000217314 6.97E-05 3.117594 818 0.0030 49027 7.72E-05 - 6.74598531 6 Table 6.Regression model using the full dataset. Using the subset of the data with fewer metropolitan areas that also included a percent walking for leisure variable, we were able to create a model that accounted for about 42% of the variability in annual deaths per 100,000. This time we kept the Walkscore, which reduces the R value if it is removed. See Table 7. The form of the relationship is Annual deaths = 4.85 – 0.0769 x (% walking) + 0.000170 x (Population Density) + 0.011 x (Walkscore) SUMMARY OUTPUT Response Variable Avg. annual pedestrian deaths per 100,000 (2000--2009) Regression Statistics Multiple R 0.683032844 R^2 0.466533865 Standard Error 0.467002493 Adjusted R^2 0.416521415 Observations 36 ANOVA df SS MS F Significance of F Regression 3 6.1032997 02 2.0344332 34 9.328354 528 0.000140149 Residual 32 6.9789225 2 0.2180913 29 Total 35 13.082222 22
  • 13. HYPOTHESIS TESTING Page 11 Coefficients Standard Error t-Statistics p-Value Lower 95% Upper 95% Intercept 4.854104449 0.8732036 64 5.5589602 36 3.91E-06 3.075446789 6.632762108 % any walking in the past month -0.076929247 0.0207822 18 - 3.7016860 94 0.000803 202 -0.11926124 -0.034597254 Population Density 0.000169983 8.28E-05 2.0521296 48 0.048413 212 1.26E-06 0.000338707 Walkscore for central city -0.011052871 0.0061009 55 - 1.8116624 84 0.079433 331 -0.023480109 0.001374367 Table 7. Regression model using the smaller data set. VI. CONCLUSIONS The following discusses the conclusions of each test: 1. T – Test Results of the T-test show that there may not be any difference in the annual deaths for the more walkable and the less walkable cities.. 2. F – Test Results of the F-test show that Florida has a disproportionately high number of annual pedestrian deaths. The second F-test also shows that the percentage of people walking does not appear to significantly change from state to state. 3. Scatterplot / Correlation / Regression Scatterplot, Correlation, and Regression tests show that Walkscore, Population Density, and the proportion of walking (commuters or residents), all relate to annual pedestrian deaths. The second regression model seems to have the better fit, exchanging % walking to work for % walking in general, and utilizing population density and Walkscore. It is surprising that Population Density has a positive coefficient, however. Our expectation, especially since density contributes to a higher Walkscore, was that with higher densities would come fewer pedestrian fatalities. The reality is that this impact is very slightly positive.