Successfully reported this slideshow.              Upcoming SlideShare
×

Crime Analysis using Regression and ANOVA

74 views

Published on

A statistical analysis of damage to property using a predictive regression model. Also an investigation to ascertain possible differences in reported divisional burglary rates using ANOVA.

Published in: Data & Analytics
• Full Name
Comment goes here.

Are you sure you want to Yes No • Be the first to comment

• Be the first to like this

Crime Analysis using Regression and ANOVA

1. 1. CA2 Stats Project Tom Donoghue 11 December 2016 MSCDAD Statistics
2. 2. CA2 Statistics Tom Donoghue v1.0 Page 1 Table of Contents Background......................................................................................................................2 Regression........................................................................................................................2 Output from SPSS.....................................................................................................................3 Results.....................................................................................................................................9 An Example using the model.....................................................................................................9 ANOVA.............................................................................................................................9 Output from SPSS...................................................................................................................11 Results...................................................................................................................................13
3. 3. CA2 Statistics Tom Donoghue v1.0 Page 2 Background Using the CSO databases and taking a dataset that provides reported crime figures for Dublin Garda Divisions: CJQ03 Recorded Crime Offences by Garda Division, Type of Offence and Quarter (2003Q1-2016Q2) -Modified on 28/09/16 at 11:02 We were asked to conduct two pieces of statistical analysis on the dataset. The following section describe the statistical analysis conducted. Regression We are looking to see if we can build a model which could be used to predict damage to property crime rates using the various other type of crimes reported across the given 6 Garda Divisions. Our dependent continuous outcome variable is the number of damage to property crimes. Our independent predictor variables are also continuous and comprise Burglary, Sexual offences and Weapons and Explosives offences. Initial check running scatterplots to examine the relationships between the outcome variable and the predictors provided the following output:
4. 4. CA2 Statistics Tom Donoghue v1.0 Page 3 We can see that there is a plausible linear relationship between the predictor variables and the outcome variable. The output below provides a further preliminary check for multicollinearity and of the relationships of the predictors and the outcome variables. This shows that we do not have predictors that are too highly correlated (i.e. r > 0.8) with each other and hence no multicollinearity in the data. Only taking the predictors variables into account then the highest correlation is between Burglary and Sexual offences (r = .275, p < .05). The predictor with the highest correlation with our outcome variable is Burglary (r = .546, p < .001). Output from SPSS
5. 5. CA2 Statistics Tom Donoghue v1.0 Page 4 The model summary shows a single model as we entered all the variables simultaneously in 1 block using Forced Entry. The rationale being that we have no research available to us that would indicate a particular order in which we should input the predictor variables. The R value shows the correlation between the predictors and the outcome at .812. R2 shows how much the variability in the outcome is accounted for by the predictors with a value of .66, which means the predictors account for 66% of the variance in damage to property crime rates. To obtain an idea of how well our model generalises, we look at difference between R2 and Adjusted R2 as the difference is .660 -.642 = .018 of 18%. This indicates that if the model were derived from the population rather than the sample it would account for approximately 18% less variance in the outcome. The Durbin-Watson statistic indicates whether the assumption of independent error is tenable; the result above = 1.341 and is greater than 1 and less than 3 which is the conservative rule and at this value it has been met. The ANOVA indicates that our model has a significant fit to the data overall F (3, 56) = 36.299 p < 0.001 we have a significant fit to our data. This tells us that by using the model we are significantly better at predicting values of the outcome than by using the mean. Beta values: Burglary b = 0.456 indicates that as burglary increases by 1 unit, Damage to property will increase by 0.456 units. Sexual offences b = 1.67 indicates that as sexual offences increases by 1 unit, Damage to property will increase by 1.67 units.
6. 6. CA2 Statistics Tom Donoghue v1.0 Page 5 Weapons and Explosives b = 4.07 indicates that as the Weapons and Explosives offence increases by 1 unit, Damage to property will increase by 4.07 units. The standardised beta values allow direct comparison of the predictors in the model and indicate Burglary = 0.55, Weapons and Explosions = 0.53 and Sexual offences = 0.193 Examining the t-test section we can see that both Burglary t(56) = 6.68, p < 0.001, and Weapons and Explosions t(56) = 6.53, p < 0.001, are making a significant contribution to the model. Sexual offences t(56) = 2.29, p < 0.05, also makes a significant contribution but less than that of the other two predictors. Checking for multicollinearity, the VIF values are all less than 10 which indicates that there is probably no cause for concern. The average VIF = 1.12 which is close to 1, again indicating no probable cause for concern. Assessing the table below for additional multicollinearity check, using the Eigenvalues none of the predictors have a high variance proportions on the same small Eigenvalue (i.e. Dimension 4 ). The Casewise diagnostics show that we have 3 cases that are treated as outliers (we set the residuals from 3 to 2 standard deviations when selecting casewise diagnostics). In an ordinary sample we would expect to see 95% of the standardised residuals lie between ± 2. In our sample of 60 we see 3 cases or 5% that have a standardised residuals outside these limits. As 99% of the cases should lie within ± 2.5, we would expect to see 1% outside these limits. We have a single case, 51 that has a standardised residual of 3.126, which we may wish to investigate further. Other than that, the diagnostics provide no other cause for concern.
7. 7. CA2 Statistics Tom Donoghue v1.0 Page 6 Charts The histogram and P-P plot below indicate the normality of residuals. There is some deviation in the P-P plot and the histogram has a few gaps, which could be improved by increasing the sample size. In the scatterplot below we check for Homoscedasticity and Linearity. The zpred v zresid and partial plots as follows:
8. 8. CA2 Statistics Tom Donoghue v1.0 Page 7 This random scatter pattern indicates that the assumptions linearity and homoscedasticity have been met. This scatter pattern indicates that the assumptions linearity (positive relationship to Damage to property) and homoscedasticity (dots are well spaced out with no outliers) have been met.
9. 9. CA2 Statistics Tom Donoghue v1.0 Page 8 This scatter pattern shows a positive relationship to Damage to property (although slightly less linear than the other predictors) and homoscedasticity (dots are well spaced out with no outliers) have been met. This scatter pattern indicates that the assumptions linearity (positive relationship to Damage to property). So the assumptions of linearity and homoscedasticity (dots are well spaced out with no outliers) have been met.
10. 10. CA2 Statistics Tom Donoghue v1.0 Page 9 Results Linear model of predictors of Damage to Property crime rates, with a 95% confidence interval. Step 1 B SE B Β P (Constant) -10.285 (-99.83, 79.26) 44.70 Burglary 0.456 (0.32, 0.59) .07 .55 p < 0.001 Sexual Offences 1.67 (0.21, 3.13) .73 .19 p < 0.05 Weapons and Explosives Offences 4.07 (2.82, 5.31) .62 .53 p < 0.001 Note : R2 = .66 An Example using the model Damage to property = -10.285+(0.456 burglary) +(1.67 Sexual offence) + (4.07 Weapons Explosives offences) As an example using the equation Burglary = 383 Sexual Offence = 20 Weapons and Explosives = 60 Damage to property = -10.285+(0.456 * 383) +(1.67 * 20) + (4.07 *60) = -10.285 + 174.65 + 33.4 + 244.2 = 442 ANOVA We are investigating to see if there a difference in reported burglary rates between the 6 Dublin Garda Divisions. There is one categorical independent variable with 6 levels of the factor (representing the 6 Garda divisions). There is one continuous dependant variable which is burglary and related offences, Sample sizes are n=10 for each Garda division. We are assuming that the observations are independent. Preliminary tests were conducted to check for a normal distribution using SPSS histogram and Q-Q as see below: Fig 1. Histogram
11. 11. CA2 Statistics Tom Donoghue v1.0 Page 10 Fig. 2 Q-Q Plot The histogram is symmetrical and more or less bell shaped indicating normality. The Q-Q plot also indicates normality with the dots lying along the diagonal line. Our Null Hypothesis is that there is no difference between all the means of the 6 Garda Divisions for burglary rates. Our Alternative Hypothesis is that at least one of the means is a different. Due to our relatively small sample sizes we decided to set an alpha value of 0.01 to reduce the risk of Type I error.
12. 12. CA2 Statistics Tom Donoghue v1.0 Page 11 Output from SPSS The Descriptives give us a sanity check and confirm our k groups and n sample sizes. Examining the Levene statistic for homogeneity of variance it was noted that it was not significant at p > 0.05. The omnibus results indicate that the groups are significantly different F (5, 54) = 11.445, p < .001, but we need to examine the Post Hoc Tests to discover which of the comparisons is different. The Post Hoc tests were Tukey HSD which compares each group to all remaining groups, hence indicating whether there is there is a significant difference between the means. A Bonferroni post hoc test was included in the test options but may be too strict as we have already lowered the alpha level to 0.01 and we could be risking making Type II errors. For the purposes of this analysis we will use Tukey.
13. 13. CA2 Statistics Tom Donoghue v1.0 Page 12
14. 14. CA2 Statistics Tom Donoghue v1.0 Page 13 Results A one way between groups ANOVA was carried out to investigate reported burglary crimes between groups of Dublin Garda Divisions. The Garda Division comprised of 6 groups (61 ,D.M.R. South Central Garda Division, 62 ,D.M.R. North Central, 63 ,D.M.R. Northern, 64 ,D.M.R. Southern, 65 ,D.M.R. Eastern, 66 ,D.M.R. Western). There was a statistically significant difference at the p < 0.01 level in the burglary crimes reported for the 6 groups: F (5, 54) = 11.445, p < .001. Post Hoc comparisons using the Tukey HSD test indicated that the mean score for Group 61 (M = 391, SD=73.5) did not differ significantly from any of the remaining groups. However, Group 62 (M = 236, SD=46.9) was significantly different from Groups 63 (M = 542.2, SD=175.2), 64 (M = 561, SD=164.6), 65 (M = 564, SD=133.2, and 66 (M = 564, SD=133.2). Group 63, 64, 65 and 66 were significantly different to Group 62. As a result we reject the Null Hypothesis.