Crime Analysis using Regression and ANOVA

CA2 Stats Project
Tom Donoghue
11 December 2016
MSCDAD
Statistics

CA2 Statistics
Tom Donoghue v1.0 Page 1
Table of Contents
Background......................................................................................................................2
Regression........................................................................................................................2
Output from SPSS.....................................................................................................................3
Results.....................................................................................................................................9
An Example using the model.....................................................................................................9
ANOVA.............................................................................................................................9
Output from SPSS...................................................................................................................11
Results...................................................................................................................................13

CA2 Statistics
Background
Using the CSO databases and taking a dataset that provides reported crime figures for Dublin Garda
Divisions:
CJQ03 Recorded Crime Offences by Garda Division, Type of Offence and Quarter (2003Q1-2016Q2)
-Modified on 28/09/16 at 11:02
We were asked to conduct two pieces of statistical analysis on the dataset. The following section
describe the statistical analysis conducted.
Regression
We are looking to see if we can build a model which could be used to predict damage to property
crime rates using the various other type of crimes reported across the given 6 Garda Divisions.
Our dependent continuous outcome variable is the number of damage to property crimes. Our
independent predictor variables are also continuous and comprise Burglary, Sexual offences and
Weapons and Explosives offences.
Initial check running scatterplots to examine the relationships between the outcome variable and the
predictors provided the following output:

CA2 Statistics
We can see that there is a plausible linear relationship between the predictor variables and the
outcome variable. The output below provides a further preliminary check for multicollinearity and of
the relationships of the predictors and the outcome variables.
This shows that we do not have predictors that are too highly correlated (i.e. r > 0.8) with each other
and hence no multicollinearity in the data.
Only taking the predictors variables into account then the highest correlation is between Burglary and
Sexual offences (r = .275, p < .05). The predictor with the highest correlation with our outcome variable
is Burglary (r = .546, p < .001).
Output from SPSS

CA2 Statistics
The model summary shows a single model as we entered all the variables simultaneously in 1 block
using Forced Entry. The rationale being that we have no research available to us that would indicate
a particular order in which we should input the predictor variables.
The R value shows the correlation between the predictors and the outcome at .812. R2
shows how
much the variability in the outcome is accounted for by the predictors with a value of .66, which means
the predictors account for 66% of the variance in damage to property crime rates.
To obtain an idea of how well our model generalises, we look at difference between R2
and Adjusted
R2
as the difference is .660 -.642 = .018 of 18%. This indicates that if the model were derived from
the population rather than the sample it would account for approximately 18% less variance in the
outcome.
The Durbin-Watson statistic indicates whether the assumption of independent error is tenable; the
result above = 1.341 and is greater than 1 and less than 3 which is the conservative rule and at this
value it has been met.
The ANOVA indicates that our model has a significant fit to the data overall F (3, 56) = 36.299 p < 0.001
we have a significant fit to our data. This tells us that by using the model we are significantly better at
predicting values of the outcome than by using the mean.
Beta values: Burglary b = 0.456 indicates that as burglary increases by 1 unit, Damage to property will
increase by 0.456 units.
Sexual offences b = 1.67 indicates that as sexual offences increases by 1 unit, Damage to property will
increase by 1.67 units.

CA2 Statistics
Weapons and Explosives b = 4.07 indicates that as the Weapons and Explosives offence increases by
1 unit, Damage to property will increase by 4.07 units.
The standardised beta values allow direct comparison of the predictors in the model and indicate
Burglary = 0.55, Weapons and Explosions = 0.53 and Sexual offences = 0.193
Examining the t-test section we can see that both Burglary t(56) = 6.68, p < 0.001, and Weapons and
Explosions t(56) = 6.53, p < 0.001, are making a significant contribution to the model. Sexual offences
t(56) = 2.29, p < 0.05, also makes a significant contribution but less than that of the other two
predictors.
Checking for multicollinearity, the VIF values are all less than 10 which indicates that there is probably
no cause for concern. The average VIF = 1.12 which is close to 1, again indicating no probable cause
for concern.
Assessing the table below for additional multicollinearity check, using the Eigenvalues none of the
predictors have a high variance proportions on the same small Eigenvalue (i.e. Dimension 4 ).
The Casewise diagnostics show that we have 3 cases that are treated as outliers (we set the residuals
from 3 to 2 standard deviations when selecting casewise diagnostics). In an ordinary sample we would
expect to see 95% of the standardised residuals lie between ± 2. In our sample of 60 we see 3 cases
or 5% that have a standardised residuals outside these limits. As 99% of the cases should lie within ±
2.5, we would expect to see 1% outside these limits. We have a single case, 51 that has a standardised
residual of 3.126, which we may wish to investigate further. Other than that, the diagnostics provide
no other cause for concern.

CA2 Statistics
Charts
The histogram and P-P plot below indicate the normality of residuals. There is some deviation in the
P-P plot and the histogram has a few gaps, which could be improved by increasing the sample size.
In the scatterplot below we check for Homoscedasticity and Linearity. The zpred v zresid and partial
plots as follows:

CA2 Statistics
This random scatter pattern indicates that the assumptions linearity and homoscedasticity
have been met.
This scatter pattern indicates that the assumptions linearity (positive relationship to
Damage to property) and homoscedasticity (dots are well spaced out with no outliers) have
been met.

CA2 Statistics
This scatter pattern shows a positive relationship to Damage to property (although slightly
less linear than the other predictors) and homoscedasticity (dots are well spaced out with
no outliers) have been met.
This scatter pattern indicates that the assumptions linearity (positive relationship to
Damage to property). So the assumptions of linearity and homoscedasticity (dots are well
spaced out with no outliers) have been met.

CA2 Statistics
Results
Linear model of predictors of Damage to Property crime rates, with a 95% confidence interval.
Step 1 B SE B Β P
(Constant) -10.285
(-99.83, 79.26)
44.70
Burglary 0.456
(0.32, 0.59)
.07 .55 p < 0.001
Sexual Offences 1.67
(0.21, 3.13)
.73 .19 p < 0.05
Weapons and
Explosives
Offences
4.07
(2.82, 5.31)
.62 .53 p < 0.001
Note : R2
= .66
An Example using the model
Damage to property = -10.285+(0.456 burglary) +(1.67 Sexual offence) + (4.07 Weapons Explosives offences)
As an example using the equation
Burglary = 383
Sexual Offence = 20
Weapons and Explosives = 60
Damage to property = -10.285+(0.456 * 383) +(1.67 * 20) + (4.07 *60)
= -10.285 + 174.65 + 33.4 + 244.2
= 442
ANOVA
We are investigating to see if there a difference in reported burglary rates between the 6 Dublin Garda
Divisions.
There is one categorical independent variable with 6 levels of the factor (representing the 6 Garda
divisions).
There is one continuous dependant variable which is burglary and related offences, Sample sizes are
n=10 for each Garda division. We are assuming that the observations are independent.
Preliminary tests were conducted to check for a normal distribution using SPSS histogram and Q-Q as
see below:
Fig 1. Histogram

CA2 Statistics
Fig. 2 Q-Q Plot
The histogram is symmetrical and more or less bell shaped indicating normality. The Q-Q plot also
indicates normality with the dots lying along the diagonal line.
Our Null Hypothesis is that there is no difference between all the means of the 6 Garda Divisions for
burglary rates. Our Alternative Hypothesis is that at least one of the means is a different. Due to our
relatively small sample sizes we decided to set an alpha value of 0.01 to reduce the risk of Type I error.

CA2 Statistics
Output from SPSS
The Descriptives give us a sanity check and confirm our k groups and n sample sizes. Examining the
Levene statistic for homogeneity of variance it was noted that it was not significant at p > 0.05.
The omnibus results indicate that the groups are significantly different F (5, 54) = 11.445, p < .001, but
we need to examine the Post Hoc Tests to discover which of the comparisons is different.
The Post Hoc tests were Tukey HSD which compares each group to all remaining groups, hence
indicating whether there is there is a significant difference between the means. A Bonferroni post hoc
test was included in the test options but may be too strict as we have already lowered the alpha level
to 0.01 and we could be risking making Type II errors. For the purposes of this analysis we will use
Tukey.

CA2 Statistics

CA2 Statistics
Results
A one way between groups ANOVA was carried out to investigate reported burglary crimes between
groups of Dublin Garda Divisions. The Garda Division comprised of 6 groups (61 ,D.M.R. South Central
Garda Division, 62 ,D.M.R. North Central, 63 ,D.M.R. Northern, 64 ,D.M.R. Southern, 65 ,D.M.R.
Eastern, 66 ,D.M.R. Western). There was a statistically significant difference at the p < 0.01 level in
the burglary crimes reported for the 6 groups: F (5, 54) = 11.445, p < .001. Post Hoc comparisons using
the Tukey HSD test indicated that the mean score for Group 61 (M = 391, SD=73.5) did not differ
significantly from any of the remaining groups. However, Group 62 (M = 236, SD=46.9) was
significantly different from Groups 63 (M = 542.2, SD=175.2), 64 (M = 561, SD=164.6), 65 (M = 564,
SD=133.2, and 66 (M = 564, SD=133.2). Group 63, 64, 65 and 66 were significantly different to Group
62. As a result we reject the Null Hypothesis.

Crime Analysis using Regression and ANOVA

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to Crime Analysis using Regression and ANOVA

Similar to Crime Analysis using Regression and ANOVA (20)

More from Tom Donoghue

More from Tom Donoghue (7)

Recently uploaded

Recently uploaded (20)

Crime Analysis using Regression and ANOVA