SlideShare a Scribd company logo
1 of 122
Download to read offline
A Geodemographic Classification of
Ireland at Garda Sub-district Level
A Tool for Comparing Sub-districts of An Garda Síochána
Gary Russell
M.Sc. Geocomputation
National Centre for Geocomputation,
Maynooth University
2016
Professor Christopher Brunsdon – Head of Department
Martin Charlton – Programme Coordinator and Thesis Supervisor
i
Abstract
This thesis uses demographic data from the 2011 census to classify the catchment
areas of Garda stations in the Republic of Ireland (referred to as Garda Sub-districts). This
was accomplished by subjecting the data at a Garda Sub-district level to principal
component analysis and then using clustering techniques on the resulting principal
components. Similar (geodemographically speaking) Garda station areas were then
visualised using mapping techniques based on Central Statistics Office boundary shape
files. The clusters of Garda Sub-districts were named and described using the distribution
of the characteristic variables compared to the global mean of each variable. As an
example of the use for such data manipulation a mini Atlas of Garda Sub-districts was
created by comparing the crime figures of Garda Sub-districts within the resulting clusters
and visualising the results.
Acknowledgements
I would like to acknowledge and thank Professor Chris Brunsdon, Martin Charlton,
and Dr Ronan Foley. As well as all the staff and researchers in Maynooth University
Departments of Geography and Computer science, the National Centre for
Geocomputation and the All Ireland Research Observatory for the inspiration and
assistance in completing this thesis. I would also like to thank my three classmates for
their support and for putting up with my using them as sounding boards over the past year.
Lastly I would like to thank my wife, Siobhan and our two young children, Aoibhínn and
Joshua, for their unwavering support and belief that I could complete my B.A. and M.Sc.
despite the odd grumpy daddy moment.
ii
Table of Contents
Abstract...................................................................................................................................i
Acknowledgements ................................................................................................................i
Table of Contents ..................................................................................................................ii
List of Figures.......................................................................................................................iv
List of Maps...........................................................................................................................v
List of Tables........................................................................................................................vi
Introduction .........................................................................................................................1
Literature Review................................................................................................................4
General...............................................................................................................................4
Theory................................................................................................................................6
Modifiable Areal Unit Problem .........................................................................................6
Variables ............................................................................................................................8
Selection .........................................................................................................................8
Correlation.....................................................................................................................8
Units ...............................................................................................................................9
Clustering.........................................................................................................................10
Methods........................................................................................................................10
Cluster Description ......................................................................................................11
Data.....................................................................................................................................12
Building the Classification................................................................................................16
Variables ..........................................................................................................................16
Principal Components Analysis.......................................................................................19
Clustering.........................................................................................................................21
Cluster Description and Naming......................................................................................24
Results.................................................................................................................................26
Cluster One: Comfortable home owning young families in semi rural areas..................28
Cluster Two: Young, mobile, affluent, multicultural singles ..........................................30
Cluster Three: Struggling labouring communities...........................................................32
Cluster Four: Young labouring families in outer commuter areas ..................................34
Cluster Five: Settled older rural communities .................................................................36
Cluster Six: Urban city peripheral communities.............................................................38
iii
Cluster Seven: Struggling rural aging communities........................................................40
Cluster Eight: Rural farming communities......................................................................42
Cluster Nine: Small rural Townlands ..............................................................................44
Cluster Ten: Labouring rural communities in older housing stock .................................46
Cluster Eleven: Young educated commuter families.......................................................48
Cluster Twelve: Comfortable rural farming communities...............................................50
Cluster Thirteen: Affluent professional commuters in larger homes...............................52
Cluster Fourteen: Semi rural periphery manufacturing communities .............................54
The Urban Rural Divide ..................................................................................................56
Crime Atlas ........................................................................................................................58
Set Up ..............................................................................................................................58
Theft.................................................................................................................................60
Assault .............................................................................................................................62
Burglary ...........................................................................................................................64
Damage to Property or the Environment .........................................................................66
Dangerous Acts................................................................................................................68
Drugs................................................................................................................................70
Fraud ................................................................................................................................72
Kidnapping.......................................................................................................................74
Public Order.....................................................................................................................76
Robbery............................................................................................................................78
Weapons...........................................................................................................................80
Offences against the State, Justice or Organised Crime ..................................................82
Crime Atlas Comments....................................................................................................84
Issues with Data and Analysis ..........................................................................................85
Conclusion..........................................................................................................................87
Bibliography.......................................................................................................................89
Appendix 1: R Code Used Throughout the Thesis...............................................................92
Appendix 2: Table of Census Themes..................................................................................99
Appendix 3: Garda Sub-district Look-Up Tables..............................................................105
iv
List of Figures
Figure 1: Portion of Charles Booth’s classification map of London..................................................5
Figure 2: Cumulative variance explained by components 1:40 .......................................................20
Figure 3: Scree plot of (WCSS) for k = [1:100]...............................................................................23
Figure 4: Heatmap of clusters where k=14.......................................................................................23
Figure 5: Rader plot of Cluster One.................................................................................................29
Figure 6: Rader plot of Cluster Two ................................................................................................31
Figure 7: Rader plot of Cluster Three ..............................................................................................33
Figure 8: Rader plot of Cluster Four ................................................................................................35
Figure 9: Rader plot of Cluster Five.................................................................................................37
Figure 10: Rader plot of Cluster Six ................................................................................................39
Figure 11: Rader plot of Cluster Seven............................................................................................41
Figure 12: Radar plot of Cluster Eight.............................................................................................43
Figure 13: Radar plot of Cluster Nine..............................................................................................45
Figure 14: Radar plot of Cluster Ten ...............................................................................................47
Figure 15: Radar plot of Cluster Eleven...........................................................................................49
Figure 16: Radar plot of Cluster Twelve..........................................................................................51
Figure 17: Radar plot of Cluster Thirteen ........................................................................................53
Figure 18: Radar plot of Cluster Fourteen .......................................................................................55
Figure 19: Box plots by Cluster of theft related crime.....................................................................61
Figure 20: Box plots by Cluster of assault related crime .................................................................63
Figure 21: Box plots by Cluster of burglary related crime...............................................................65
Figure 22: Box plots by Cluster of damage related crime................................................................67
Figure 23: Box plots by Cluster of crimes relating to dangerous acts..............................................69
Figure 24: Box plots by Cluster of drug related crime.....................................................................71
Figure 25: Box plots by Cluster of fraud related crime....................................................................73
Figure 26: Box plots by Cluster of kidnapping related crime ..........................................................75
Figure 27: Box plots by Cluster of public order and social code crime...........................................77
Figure 28: Box plots by Cluster of robbery related crime................................................................79
Figure 29: Box plots by Cluster of weapons related crime ..............................................................81
Figure 30: Box plots by Cluster of offences against the State, justice or organised crime ..............83
Figure 31: Variance in crime rates explained by the clustering classification.................................84
v
List of Maps
Map 1: Garda Sub-districts by division............................................................................................14
Map 2: Garda Sub-districts by cluster..............................................................................................26
Map 3: Cluster One..........................................................................................................................28
Map 4: Cluster Two .........................................................................................................................30
Map 5: Cluster Three .......................................................................................................................32
Map 6: Cluster Four .........................................................................................................................34
Map 7: Cluster Five..........................................................................................................................36
Map 8: Cluster Six ...........................................................................................................................38
Map 9: Cluster Seven.......................................................................................................................40
Map 10: Cluster Eight ......................................................................................................................42
Map 11: Cluster Nine.......................................................................................................................44
Map 12: Cluster Ten.........................................................................................................................46
Map 13: Cluster Eleven....................................................................................................................48
Map 14: Cluster Twelve...................................................................................................................50
Map 15: Cluster Thirteen .................................................................................................................52
Map 16: Cluster Fourteen.................................................................................................................54
Map 17: The Urban Rural Divide ....................................................................................................56
Map 18: Theft...................................................................................................................................60
Map 19: Assault ...............................................................................................................................62
Map 20: Burglary.............................................................................................................................64
Map 21: Damage to property or the environment ............................................................................66
Map 22: Dangerous Acts..................................................................................................................68
Map 23: Drugs..................................................................................................................................70
Map 24: Fraud..................................................................................................................................72
Map 25: Kidnapping ........................................................................................................................74
Map 26: Public Order.......................................................................................................................76
Map 27: Robbery..............................................................................................................................78
Map 28: Weapons ............................................................................................................................80
Map 29: State, Justice or Organised Crime......................................................................................82
vi
List of Tables
Table 1: Derived variables used in the classification.......................................................................18
Table 2: Extreme z scores for variables within clusters ...................................................................25
Table 3: Sub-districts in Cluster One ...............................................................................................29
Table 4: Sub-districts in Cluster Two ..............................................................................................31
Table 5: Sub-districts in Cluster Three ............................................................................................33
Table 6: Sub-districts in Cluster Four ..............................................................................................35
Table 7: Sub-districts in Cluster Five...............................................................................................37
Table 8: Sub-districts in Cluster Six ................................................................................................39
Table 9: Sub-districts in Cluster Seven............................................................................................41
Table 10: Sub-districts in Cluster Eight ...........................................................................................43
Table 11: Sub-districts in Cluster Nine............................................................................................45
Table 12: Sub-districts in Cluster Ten..............................................................................................47
Table 13: Sub-districts in Cluster Eleven.........................................................................................49
Table 14: Sub-districts in Cluster Twelve........................................................................................51
Table 15: Sub-districts in Cluster Thirteen ......................................................................................53
Table 16: Sub-districts in Cluster Fourteen......................................................................................55
1
Introduction
This thesis has two main parts; part one will focus on the statistical methods used
in geodemographics to classify Garda Sub-districts. The thesis will use principal
components analysis and clustering techniques to create the classification. Part two will
then consist of a crime atlas of Ireland at Garda Sub-district level; this atlas will also
contain information regarding the performance of the clustering exercise when used with
real crime data.
The census of Ireland provides a vast amount of data at many geographic levels
that are disseminated by the Central Statistics Office (CSO). For example the smallest
geography that the census is reported at is the Small Area (Central Statistics Office,
2014). There are 18,488 Small Areas in the Republic of Ireland. The census reports 764
variables for each of the 18,488 Small Areas, this gives fourteen million, one hundred and
twenty four thousand, eight hundred and thirty two (14,124,832) individual data points.
This thesis concentrates on the 563 catchment areas of Garda stations as defined
by An Garda Síochána and reported in the census as Garda Sub Divisions or Garda Sub-
districts (Central Statistics Office, 2014). Even though this gives 563 areas to work with,
that still equates to 430,132 data points, not including crime statistics for the 13 years
available. Making sense of that amount of data requires them to be summarised so that the
end user is not overwhelmed.
Traditionally crime statistics are reported by county or region of the country as
can be seen in newspapers and websites when reporting on crime; examples include The
Independent (MacCarthaigh & Phelan, 2014) and the Irish Mirror (Jordan, 2015). While it
makes intuitive sense to amalgamate Garda stations by county for the purposes of crime
2
statistics reporting, nuances and spatial variation within counties are lost. Another
approach could be to report crimes by station; however this may arguably be problematic.
While reporting by station would show differences within counties and nationally,
without knowing something of the characteristics of the individual stations, comparing
them becomes moot. The first law of geography that ‘everything is related to everything
else, but near things are more related than distant things’ (Tobler, 1970) is generally
accepted to hold. However, just because two Garda station areas are next to each other
doesn’t automatically mean that the characteristics of the areas are the same. There is
little point comparing Ballina in County Mayo with Killala for example. The catchments
are neighbours, but Ballina is a generally urban environment with 14,329 people at the
last census in 2011, whereas Killala is a rural, coastal area that is physically larger than
the Ballina catchment but with only 3,766 people (Central Statistics Office, 2014).
This thesis advocates comparing stations not only on their location in An Garda
Síochána hierarchy, but rather based on their similarities, specifically the underlying
geodemographics. Perhaps Killala has more in common with Kilrush in Clare or
Duncannon in Wexford and comparing it to its peers is a more sensible approach than
comparing it to its neighbours. That being said, it is expected that the classification
carried out in this work will show clusters of clusters throughout Ireland in support of
Tobler’s Law.
There is much literature on the area of geodemographic classification, particularly
for marketing in the United Kingdom and America. Brunsdon et al. (2014) have created a
geodemographic classification of Ireland at the Small Area scale. Gale et al. (2015) have
used geodemographics in London relating to crime; however there does not appear to be a
classification at the national level for Ireland relating to Garda station catchments. It is
3
felt that this thesis may address a gap in the literature and provide a new tool for
comparing policing in Ireland.
The first aim is to create the classification. The steps will be described in detail
throughout this thesis; however, in short, census data will be used to create the
classification. Relevant variables will be chosen and transformed for use. Methods of
reducing the data to manageable proportions will be used and then the data will be
subjected to a clustering algorithm. The resulting clusters of similar Garda Sub-districts
will then be named and described. The second part of the thesis will then use these
clusters to map various crime statistics at a national and cluster by cluster basis to allow
comparison based on similar Garda stations to be made.
It is hoped that in creating this classification, policy makers may begin to ask
questions such as: if this area A has similar characteristics to area B, why are the crime
rates so different? Are more resources needed to reflect the similar catchment
demographics? Are the opening hours of a station appropriate given the social
demographic makeup of the population? It is further hoped that the answers to these
questions may be found by those with access to more information, such as Garda numbers
and skill breakdowns, policing and social services infrastructure in areas etc. Lastly it is
anticipated that this thesis can be built upon by carrying out the same classification with
up to date data when the census is completed and made available after census 2016.
4
Literature Review
General
Classification is a natural human process that helps us understand and make sense
of the world around us. Parker et al. (2007) assert that the natural process of classification
undertaken by lay people on a daily basis is fundamental to the ‘sociospatial construction
of reality’. Perec (1989) quoted in Boyne (2006) speaks of a dream of universal
classification, or law, to describe the whole world that did not, and could not work.
So it was imagined that the entire world could be distributed according to a unique
code, that one universal law would reign over the totality of phenomena: two
hemispheres, five continents, masculine and feminine, animal and vegetable,
singular plural, right left, four seasons, five senses, six vowels, seven days, twelve
months, twenty-six letters. (Perec, 1989:155)
Vickers and Rees (2011) state that complex systems can be classified to help the
understanding of those systems. However there is no one right system of classification,
Dupré (2006) argues that classifications will be driven by the purpose for which they were
created and that different classifications will be ‘good or bad for particular purposes’.
This is a view that is shared by Charlton et al. who produced one of the first open
classifications of Britain from the 1981 British Census. They state that ‘...the
“meaningfulness” of any classification is an arbitrary thing...’ (Charlton, et al., 1985).
Area classification is the act of grouping areas based on selected features with
those areas, the similarity of the characteristics of the selected features drives the
classification (Vickers & Rees, 2007). Geodemographic classifications are a type of area
classification that Vickers and Rees (2007) describe as ‘one of the most commonly used
areas classifications’. One of the things that can make geodemographic classifications
useful is the descriptions that generally accompany them to give a textual summary of the
attributes of each class (Abbas, et al., 2009). Geodemographics are widely used in
5
marketing where it is described as a ‘popular segmentation technique’ (Doyle, 2011).
Many (Abbas, et al., 2009; Gale, et al., 2015; Singleton & Longley, 2008; Vickers &
Rees, 2007) describe geodemographic classifications as tools to summarise large sets of
spatially dependent data such as census data. Gale et al. (2015) state that geodemographic
classifications allow the highlighting of similarities between population structures in
different parts of a country. Gale et al. also point out that geodemographic classifications
give summaries based not only on the population but also on the built environment.
It would be remiss to discuss geodemographic classification without mentioning
Charles Booth. Between 1886 and 1903 Booth and several assistants accompanied police
officers on the beat around London to investigate places of work, working conditions,
homes and the urban environments. Through interviews and observations Booth created a
several volume piece of work, ‘Inquiry into the Life and Labour of the People in London’
(London School of Economics and Political Science , 2012). Booth’s work (Booth, 1903)
was one of the first attempts to map [and classify] social-spatial structures (Alexiou &
Singleton, 2015). A portion of the digitised maps along with the classification Booth used
is shown below in Figure 1.
Figure 1: Portion of Charles Booth’s classification map of London
Source: London School of Economics and Political Science (2012)
6
Theory
While geodemographics have a long history of being used in one form or another,
it must be acknowledged that the theory driving geodemographics is less robust than the
results. The underpinning theory behind geodemographics is Tobler’s first law of
geography that ‘everything is related to everything else, but near things are more related
than distant things’ (Tobler, 1970). Singleton and Longley (2009) note that the theoretical
frameworks remain ‘rather weak’; this view is reiterated by Alexiou and Singleton (2015)
who state that classifications based on geodemographics lack solid theory. Another issue
with many geodemographic classifications is that they are not generally geographically
weighted, and due to the methods of their construction are aspatial in design despite
showing spatial correlations in the results (Alexiou & Singleton, 2015). While the issues
with theoretical grounding are acknowledged, Singleton and Longley express their hopes
for best practice geodemographics in that they are: focused, recognise the providence of
the data used, are scientifically reproducible and use the best methods available
(Singleton & Longley, 2009).
Modifiable Areal Unit Problem
Geodemographic analysis is generally agreed to be best carried out at the smallest
areal unit available in order not to lose spatial variation that larger units may obscure
(Alexiou & Singleton (2015), Charlton, et al. (1985) and Gale, et al. (2015) are
examples). However the scale also depends on the purpose of the classification (Alexiou
& Singleton, 2015), therefore a balance must be struck. Another factor to consider is the
Modifiable Areal Unit Problem (MAUP). Gehlke and Biehl (1934) noted that choices in
data aggregation over space and the size of areal unit used in analysis have influence over
the correlation coefficient. This was expanded upon by Openshaw and Taylor (1979) who
7
coined the phrase Modifiable Areal Unit Problem. They conducted experiments on a
spatial data set and found that they could obtain correlations of between -.99 and .99 from
different levels of aggregation. Charlton and Brunsdon (2016) presented a paper at the
GIS Research UK Conference which revisited the work by Gehlke and Biehl using census
data from Ireland at several different official aggregation levels ranging from Small Areas
to Counties. They were able to show that the larger areas lost variance between areas and
increased correlations (as total variance doesn’t change).
While the MAUP is a factor, there is not much that can be done about it if, for
example, data is only available at one areal unit scale. If using official areal units one
must also be aware of the MAUP when comparing results over time, as these boundaries
may change. An example of this is the Irish Electoral Constituencies which were changed
by an act of the Irish Parliament (Houses of the Oireachtas, 2013) as required by Article
16.4 of the Irish Constitution every twelve years (Government Publications, 2016).
Likewise and more relevant for this work are changes to Garda Sub-districts, the
boundaries were changed in 2013 following the closure of some 100 Garda stations (An
Garda Siochana, 2013; Central Statistics Office, 2014). This is an issue noted in the
United Kingdom in relation to British police Basic Command Units (BCU), where Ashby
and Longley (2005) state ‘Maintaining the BCU families is an arduous task due to the
temporal instability of these administrative units’. Therefore any follow up to this thesis
should be aware of the possibility the MAUP affecting results should the boundaries be
changed by An Garda Síochána.
8
Variables
Selection
Harris et al. (2005) explain that a geodemographic classification is created by
grouping areas that are alike in to a number of classes, often based on census data. As
geodemographic classifications are a way of summarising social, demographic and built
characteristics of zoned geography (Gale, et al., 2015), it makes sense to use census data.
Vickers and Rees (2007) also argue that a national census (in their case British, but the
principal holds) stands above other sources due to its amount of data and ‘comprehensive
geographic coverage’. The choices of variables to use in the classification drive the
results, Charlton et al. (1985) describe the choosing of variables as ‘absolutely crucial’.
However, they do state that the choices are very difficult to make. Vickers and Rees
(2007) suggest that variables should be chosen only if there is a good reason; this implies
that including a variable just because one has the data is not the best policy.
Correlation
Variable correlation can be an issue when using census data to inform analysis.
Collinearity in the data can affect the performance of any significance tests that may be
carried out on, for example, linear regressions (Anderson, et al., 2010). In classification
exercises correlation between variables creates redundancy in the input data (Alexiou &
Singleton, 2015). Two main methods exist in dealing with correlation within the variables
and it is acknowledged that there is no general rule (Vickers & Rees, 2007). One
approach adopted is to remove one of the pairs of highly correlated variables from use
(Alexiou & Singleton, 2015; Vickers & Rees, 2007). Another approach is to use Principal
Components Analysis (PCA) to transform a set of N correlated variables into a set of n
uncorrelated principal components. This approach was used by Charlton et al. (1985) and
9
Brunsdon et al. (2014). With principal components, each component is a linear
combination of the parent variables so the variance is retained but the components are
uncorrelated (Alexiou & Singleton, 2015). Additionally, the first component accounts for
the most variance and each component adds less to the overall variance explained
(Jolliffe, 2002). The user can therefore decide how much variance they are willing to
sacrifice in order to reduce dimensionality in the data by using fewer principal
components than the number of input variables (Charlton, et al., 1985; Harris, et al., 2005;
Jolliffe, 2002).
Units
Variables used in geodemographics can be reported at different units such as
percentages of population, count data, indices etc. (Alexiou & Singleton, 2015). This can
make comparison of variables difficult. The census of Ireland (Central Statistics Office,
2014) reports most available data as a count of people; therefore it is relatively simple to
convert any required variable to percentage of population within the spatial unit. This not
only makes the variables easier to compare, it also stops areas with high population
figures affecting the analysis due to higher absolute numbers. The variables may also be
standardised using z scores to allow for true comparison of the individual variables
influence on a cluster. The z score is a measure of the relative location in a data set of the
observation, therefore data points in two different data sets with the same z score have the
same relative location, i.e. they are the same number of standard deviations from the
mean (Anderson, et al., 2010).
10
Clustering
Methods
Clustering involves finding subsets of interest within a larger set, the subsets are
called clusters and are usually homogeneous within each cluster and separated between
clusters (Hansen & Jaumard, 1997). Gordon (1987) notes that it is ‘difficult to make an
informed choice of relevant clustering strategies’, Vickers and Rees (2007) maintain that
there is no right or wrong way to classify. Commercial classifications tend to build from
the ground up, clustering at the smallest available level then aggregating in to larger
groups (Singleton & Longley, 2008). The open Output Area Classification in the UK,
however, was clustered from the top down by creating several large clusters that were
then subjected to clustering techniques separately (Vickers & Rees, 2007). It is widely
acknowledged among the available literature that k-means clustering is the technique of
choice for geodemographic clustering. This is shown in either the acknowledgement of k-
means in theoretical papers or the used of k-means in applied papers (Abbas, et al., 2009;
Alexiou & Singleton, 2015; Brunsdon, et al., 2014; Charlton, et al., 1985; Vickers &
Rees, 2007).
K-means clustering is seen to have something of an advantage over other methods
such as agglomerative, divisive, constructive or direct optimisation (described well in
Gordon (1987)). This is because they are all hierarchical in nature and will force a
hierarchy on the output even if one does not exist (Gordon, 1987). K-means will not force
a hierarchy on the output, however it has the disadvantage that it requires to ‘know’ the
number of clusters beforehand (Singleton & Longley, 2008). Clustering techniques
generally require a measure of dissimilarity between observations (Jolliffe, 2002). K-
means uses the squared Euclidean distance (Alexiou & Singleton, 2015). In essence k-
means uses k clusters to sort n observations while minimising the sum of squared errors
11
(Alexiou & Singleton, 2015; Ding & He, 2004). K-means assigns each observation to a
cluster while minimising sum of squares, a new set of means is then calculated and the
process begins again. The process only stops when the within cluster sum of squares
(WCSS) is minimised. This occurs when cluster assignments no longer change as any
changes would not make the sum of squares smaller (Alexiou & Singleton, 2015).
Cluster Description
Once the WCSS is minimised and the clusters are assigned, the results need to be
described. The aim of cluster descriptions is to provide a short profile of each cluster for
the end user. Vickers and Rees (2007) explain that profiles use text and visuals to help the
end users’ understanding of the cluster group in a few sentences. The cluster labelling and
description process is acknowledged by Vickers and Rees (2007) to be difficult and
subject to much thought, in order not to mislead the user or offend the people living in the
areas classified.
Cluster descriptions draw on the main identifiable (Debenham, 2002), dominant
(Abbas, et al., 2009) characteristics of a cluster. Often, the process involves using z scores
to identify extreme variables within the cluster compared to the global mean (Debenham,
2002; Vickers & Rees, 2007). The descriptions that are attached to geodemographic
classifications are viewed as useful to other researchers (Abbas, et al., 2009). Parker et al.
(2007) go as far as to suggest that the classification descriptors may be the ‘most
sociologically interesting’ element of the geodemographic classification process.
Therefore the naming and description element of geodemographic classification should
not be overlooked or given less attention than the more statistical elements of the process.
12
Data
Three data sets are used throughout this work to create the classification:
1. Census 2011, all 764 reported census variables in columns, at Garda Sub-district
level in 563 rows (Central Statistics Office, 2014).
2. Garda Sub-district boundary files for use in mapping the outcomes (Central
Statistics Office, 2014a).
3. Crime data for Ireland at Garda Sub-district level (Central Statistics Office, 2016).
The crime data is agglomerated at the Garda Sub-district level to 12 crime types.
 Attempts/threats to murder, assaults, harassments and related offences.
 Dangerous or negligent acts.
 Kidnapping and related offences.
 Robbery, extortion and hijacking offences.
 Burglary and related offences.
 Theft and related offences.
 Fraud, deception and related offences.
 Controlled drug offences.
 Weapons related offences.
 Damage to property and to the environment.
 Public order and other social code offences.
 Offences against government, justice procedures and organisation of
crime.
13
The data for this study fall in to two main areas: socio-demographic data from the
Census of Ireland (Central Statistics Office, 2014), and crime data (Central Statistics
Office, 2016). Both sets of data are reported at the Garda Sub-district level.
There are 563 Garda Sub-districts in Ireland (Central Statistics Office, 2016). The
Sub-districts range in size from ≈1.9km² in Fitzgibbons Street in Dublin, to ≈658km² in
Louisburgh in Mayo. They have populations ranging from 384 in Sraith Salach in Galway
to 98,078 in Blanchardstown in Dublin (Central Statistics Office, 2014).
These Sub-districts are based loosely on the official geography of Irish
Townlands, but were designed by the Examiner of Maps (GIS) at An Garda Síochána to
suit the organisation’s needs (Creaner, 2016). The Sub-districts are a unique data set in
that they are designed for operational rather than statistical reasons. It is acknowledged by
Creaner that using Small Areas would be better statistically. However the use of Small
Areas is unlikely as it doesn’t make operational sense for the national police force to end
a catchment area in the middle of a motorway (for example), as may happen with Small
Areas (Creaner, 2016). The Garda Sub-districts are shown, grouped in to administrative
divisions in Map 11
.
It is not known exactly how the 2011 census data were attached to the 2013 Garda
Sub-district geography. The assumption in this thesis is that the CSO would be able to
populate the new boundaries at the household level. At the time of writing the CSO have
not replied to my queries, however the designer of the Sub-districts (Creaner, 2016) has
agreed that this assumption is a reasonable one. Another option would be to populate the
Sub-districts by centroid based on a smaller unit such as Small Areas, if this is the case it
is not felt that there would be too much loss of overall validity due to the number of Small
1
All maps produced in this thesis use boundary files from the CSO website and contain Ordnance Survey
Ireland Data (Ordnance Survey Ireland, 2012).
14
Areas (18,488) being assigned to one of 563 Sub-districts. Aside from this ambiguity,
there is no known issue with Irish Census Data.
Map 1: Garda Sub-districts by division
The census data used in this thesis is made up of 764 variables that are derived
from household level data and amalgamated to the various levels reported from 18,488
15
Small Areas to four Provinces (Central Statistics Office, 2014). The MAUP discussed in
the literature review section is relevant, as carrying out the classification at different scale
will produce different results. It may be possible to carry out the classification at Small
Area level and then combine these results in to larger unit scales. However this is not
appropriate for this study. Firstly, there are no smaller units that fit in to Sub-districts due
to the proprietary nature of the Sub-districts. Secondly all data required are available at
the Sub-district level. Lastly, Sub-districts are the smallest geographical unit that crime
data are released in Ireland (Central Statistics Office, 2016). Therefore it is acknowledged
that variation within the Garda Sub-districts is lost during this study. However because
this classification is for the purpose of being able to compare Garda Stations on a like for
like basis, it is felt that the loss is acceptable at a national level.
16
Building the Classification
The classification was built in R, a free, open source statistical computing
environment that can handle large amounts of data (The R Foundation, 2015). Some code
blocks will be included where needed for clarification. However in the interest of
reproducibility the full R code that created the classification is included in Appendix 1.
Variables
This classification aims to give reproducible results that are comparable to similar
studies at different scales. Charlton et al. (1985) chose their variables to give a
comparable classification between their open one and the commercial ACORN
classification in the UK. Brunsdon et al. (2014) chose Irish Census variables at the Small
Area Scale to reflect the OAC classification variables chosen by Vickers and Rees (2007).
Therefore this study will use the same variables as Brunsdon et al. The full list of variable
codes reported by the Census is available for download from the CSO website (Central
Statistics Office, 2014). An adapted list is included in Appendix 2 for reference should the
reader require clarification on any variable codes used.
The variables chosen for the classification exercise are actually derived variables.
Each variable is made up of two or more individual census variables to derive variables
that are percentages of the population of the area in question. For example one variable
used is that of lone parents. This variable is derived by adding the Lone Mothers with
Children (number of families) to Lone Fathers with Children (number of families),
dividing by the total number of families and multiplying the result by 100. The actual
code is shown below.
loneParent <- 100*(T4_3FTLF + T4_3FTLM) / T4_5TF
17
In all there are 40 derived variables used in the classification grouped in to six
areas: demographic, household composition, housing, socioeconomic, employment and
connectivity. The derived variables are shown in Table 1; the actual make up of each
derived variable can be seen in the R code in Appendix 1. As stated, the variables from
the CSO were reported at Garda Sub-district level. This aided in mapping as both the
census file and the boundary file contained unique geography ID numbers (GEOGID) for
each Sub-district. The numbers were slightly different in that one set was prefixed with an
‘N’, and the other set was prefixed with an ‘M’. This was fixed by carrying out a find and
replace command in Excel before the census file was loaded in R, however it could have
just as easily been carried out in R.
18
Theme DerivedVariable Description
Demographics Age0-4 Percentage of population aged 0-4
Age5-14 Percentage of population aged 5-14
Age25-44 Percentage of population aged 25-44
Age 45-64 Percentage of population aged 45-64
Age65+ Percentage of population aged 65and over
EUNat Percentage of population that is European by nationality (excluding Irish)
RestofWorld Percentage of population where nationality was given as Rest of the World
BornOutsideIRE Percentage of population not born in Ireland
Housing Composition Separated Percentage of persons separated or divorced
SinglePerson Percentage of persons (non pensioners) living in one person households
Pensioner Percentage of persons who are pensioners
LoneParent Percentage of families that are lone parent families
NoChildren Percentage of families that are 'pre family' (no children born)
NonDependChildren Percentage of families with children where the youngest child is 20+
Housing RentPublic Percentage of totalhouseholds rented fromlocalauthority
RentPrivate Percentage of totalhouseholds privately rented
Flats Percentage of totalhouseholds defined as flats
NoCentralHeat Percentage of totalhousehold with no centralheating
RoomsHH Average number of rooms per household
PeoplePerRoom Totalpersons ÷ totalrooms
SepticTank Percentage of totalhouseholds with an individualseptic tank
Socioeconomic HEQual Percentage of persons with an Ordinary Bachelors Degree or higher
Employed Percentage of persons at work
TwoCars Percentage of households with two or more cars
JTWPublic Percentage of persons over age 5who travelto school, college or workby means of bus or rail
HomeWork Percentage of persons self employed (Own account workers)
LLTI Percentage of persons reporting bad or very bad health
UnpaidCare Percentage of persons providing unpaid care
Employment Students Percentage of persons who are students
Unemployed Percentage of persons who are unemployed having lost or given up jobs
EconinactFam Percentage of persons looking after home/family - homemakers
Agric Percentage of workers who workin agriculture, forestry or fishing
Construction Percentage of workers who workin construction
Manufacturing Percentage of workers who workin manufacturing
Commerce Percentage of workers who workin commerce and trade
Transport Percentage of workers who workin transport and communication
Public Percentage of workers who workin public administration
Professional Percentage of workers who workin professionalservices
Connectivity Broadband Percentage of internet connected households with broadband
Internet Percentage of totalhouseholds with some kind of internet access
Table 1: Derived variables used in the classification
19
It can be seen in Table 1 that there will be issues in the data using the derived
variables as they are. For example Separated can be expected to be correlated with
LoneParent. As mentioned previously, Principal Components Analysis (PCA) is a set of
methods that can take in variables that may be correlated and produce a set of
uncorrelated principal components. PCA is also used to reduce the size of a clustering
computational problem (Jackson, 1991).
Principal Components Analysis
Once the 40 variables were chosen and derived from the census variables they
were subjected to Principal Components Analysis. The reasons were twofold; firstly to
reduce the dimensionality of the data from 40 to a more manageable number. Secondly,
PCA was used to remove any correlation in the data. The cluster algorithm used for the
classification was k means, this assumes no correlation. Therefore PCA is essential to
provide k means with a set of uncorrelated variables to carry out the clustering. The
components are linear combinations of the original variables. Each one contains a
proportion of the variance in the original data and they are ordered by the amount of
variance they explain. Therefore it is possible to view the cumulative variance explained
by the components and decide how many to use in the k means clustering.
As mentioned previously, there is a lack of theory in this regard, however the
majority of the variance should be kept otherwise the analysis looses too much
information to make the PCA worth doing. Jolliffe (2002) describes the choice of a cut off
of variance as an ad hoc rule-of-thumb that works in practice. Jolliffe suggests a range
from 70% to 90% to retain m components where m is the smallest integer for which the
cumulative variance explained is greater than the cut off.
20
There is a function in R that calculates the Principal Components for a user, called
princomp(). This function takes in the relevant variables and performs a PCA. It is
then possible to view the cumulative variance explained by each component to choose
how many to use in the clustering process. For detailed explanations of Principal
Components, works by Jackson (1991), Jolliffe (2002), or Rencher and Christensen
(2012) are recommended. However the mains steps involved are:
1. Get data – In this case 40 derived variables * 563 Garda Sub-districts
2. Subtract mean of each variable from each instance of the variable
3. Calculate correlation matrix
4. Calculate eigenvalues and eigenvectors of the correlation matrix
The Principal Components were calculated and their cumulative explanation of the
variance of the original derived variables displayed by entering the following two lines of
code in to R.
pca<- princomp(gardaVars[,-1],cor=T,scores=T)
cumsum(pca$sdev^2/sum(pcs$sdev^2))
The cumulative variance explained by the components is shown in Figure 2.
Figure 2: Cumulative variance explained by components 1:40
21
As per general recommendations mentioned earlier, this study will use the first m
components that total to at least 80% of the variance. This means that the first nine
components will be used in the study, as they account for 80.65% of the variance. This
was seen as a good cut off as the other 31 components only accounted for 19.35% of the
variance in the original data set between them. Another reason for not including the tenth
component is that it is the first component that fails a test that suggests each component
should contribute more than
100
𝑝
of the cumulative variance (Jolliffe, 2002). As p is 40 and
100
40
= 2.5, the fact that component ten only explains an extra 1.64% of the variance
excludes its use in the classification process.
Clustering
The work up to this point has concentrated on getting the data ready for the
clustering process. The nine principal components represent a much smaller data set than
the 40 derived variables, they are also not correlated. This means that they are ready for
use in a clustering algorithm. The method of clustering used in this thesis is the k means
technique as described in the literature review. The k means method was chosen as it is
the system of choice in most geodemographic classifications (Abbas, et al., 2009; Alexiou
& Singleton, 2015; Brunsdon, et al., 2014; Charlton, et al., 1985; Vickers & Rees, 2007).
K means requires the number of clusters to be known before it sets to minimise the
within cluster sum of squares. However this is not an issue in R. It is possible to run k
means through a loop in R, in this way it is possible to run the clustering exercise many
times with different possible numbers of clusters and for the user to pick the best number
for k based on the results. In order to pick the best number for k, the k means process was
run 100 times with k starting at one and being increased by one each time. The results
22
were then plotted on a scree plot shown in Figure 3. The code to run through this loop is
shown below.
nPC<- 9
set.seed(290879)
smallest.clus<- wss<- rep(0,100)
for(i in 1:100){
clus <- kmeans(pca$scores[, 1:nPC], i,)
wss[i]<- clus$tot.withinss
smallest.clus[i]<-min(clus$size)
}
plot(1:100, wss[1:100], type ="h",main= "Cluster Scree
Plot",xlab= "Number of Clusters", ylab="Within Cluster
Sum of Squares")
The scree plot in Figure 3 shows that there is a small step at 14 clusters, at this
stage central city Dublin and Cork had split in to a cluster early on. The differences
between clusters also started to get smaller. A ‘heatmap’ chart with attached dendrogram
was created in R to visualise the clusters using k = 14 and the clustering was deemed
satisfactory.
As can be seen in the Figure 4 heatmap, clusters 1 and 14 are fairly similar, as are
clusters 2 and 8. Therefore it was felt that adding more clusters (k > 14) would not add to
the classification. In addition splitting the clusters in to smaller units may have created
clusters that were too nuanced for the purpose of comparing Garda Sub-districts. The final
step in the clustering process was to join the cluster numbers to the GEOGID numbers of
individual Garda stations so that the results could be mapped.
It is possible to cluster a set of clusters in order to create higher order super groups
for description purposes. However with only 14 clusters and 563 areas it was felt that a
second level of clustering was unnecessary for this classification.
23
Figure 3: Scree plot of (WCSS) for k = [1:100]
Figure 4: Heatmap of clusters where k=14
K=14
24
Cluster Description and Naming
With the clusters set, they needed to be named and described. For some (Abbas, et
al., 2009; Parker, et al., 2007) the descriptions attached to clustering exercises such as this
one are the most interesting element of the final geodemographic classification process.
Vickers and Rees (2011) state that cluster names may be the primary source of
information used when judging a cluster in a classification by the end user.
Naming and describing the clusters was a multi stage process. All of the clusters
were mapped in order to check that the classification seemed spatially sensible. Then the z
scores of the original derived variables were calculated for each cluster, the mean z score
across all clusters was also calculated for each variable (µ), as was the standard deviation
(σ). Clusters that had variables where µ-σ > z were deemed to have an extremely low z
score for that variable. Clusters where z > µ+σ were deemed to have an extremely high z
score for that variable. These extreme highs or lows accounted for 25% of the variables.
The extreme values informed the name attached to the clusters as they were
deemed to be dominant and identifiable characteristics of the cluster in question (Abbas,
et al., 2009; Debenham, 2002; Vickers & Rees, 2007) . A table of the extremes identified
is reproduced in Table 2. In addition to the extreme z scores, the scores that were above or
below average for that variable informed more detail in the descriptions where necessary.
For the assistance of the end user a radar plot for each cluster is included in the
cluster descriptions. This plot shows the global average z score for each variable in red
and the z score for the cluster in blue. It also shows the extreme value cut offs in green
and purple.
25
Table 2: Extreme z scores for variables within clusters
26
Results
The geodemographic classification exercise resulted in the 563 An Garda
Síochána Sub-districts being clustered in to 14 groups based on their underlying
geodemographics. To assist the reader, each map has an accompanying cartogram. These
cartograms are all distorted based on the area population using the Newman and Gastner
method contained in an ArcScript add-on for ArcMap, available from Esri (2009). Map 2
shows the overall classification. The accompanying cartogram allows the user to compare
areas based on population size.
Map 2: Garda Sub-districts by cluster
Displaying all the clusters at once has limited value, except to get an overall
picture of the cluster distributions. Therefore each cluster will be discussed in turn. In
each description the cluster will be mapped, represented as a cartogram and a radar plot of
the variables included. The individual station names and divisions in each cluster will also
27
be reported. Should the reader wish to see what cluster an individual station is in, a look
up table is attached in Appendix 3. This look up table is sorted alphabetically by Division,
Sub-division, and then Sub-district.
28
Cluster One: Comfortable home owning young families in semi rural areas.
This cluster consists of 55 Sub-districts serving 167,880 mainly Irish families
living in houses, generally working in manufacturing, construction or agriculture. Car
ownership is high with very high instances of two or more car households. There are low
instances of lone parent families or local authority renters. Houses in this cluster tend to
be larger than average and serviced by septic tanks.
Map 3: Cluster One
Extreme High Values: TwoCars, Manufacturing.
Extreme Low Values: Separated, LoneParent, RentPublic, LLTI.
29
Figure 5: Rader plot of Cluster One
Table 3: Sub-districts in Cluster One
30
Cluster Two: Young, mobile, affluent, multicultural singles
This cluster consists of 11 Sub-districts serving 243,625 people. The areas are
characterised by the number of 25-44 year olds and the lack of young children and older
people. The areas are city centre in nature with large numbers of rented flats, and low car
ownership. People who can afford to live here are well educated and work in commerce
or transporting other workers. The population is multicultural with high levels of persons
from around the EU and wider world.
Map 4: Cluster Two
Extreme High Values: Age25-44, EUNat, RestofWorld, BornOutsideIRE, SinglePerson,
Separated, LoneParent, NoChildren, RentPublic, RentPrivate, Flats, PeoplePerRoom,
HEQual, Employed, JTWPublic, Students, Commerce, Transport, Broadband.
Extreme Low Values: Age0-4, Age5-14, Age45-64, Age65+, Pensioner, RoomsHH,
SepticTank, TwoCars, HomeWork, UnpaidCare, EconinactFam, Agric, Construction,
Manufacturing.
31
Figure 6: Rader plot of Cluster Two
Table 4: Sub-districts in Cluster Two
32
Cluster Three: Struggling labouring communities
This cluster consists of 68 Sub-districts serving 1,168,789 people. These areas are
generally semi urban towns, or towns that have grown around manufacturing bases. There
are high levels of unemployment and high levels of local authority renting. These areas
may have been hit hard by the recession and abandoned plans for government
decentralisation.
Map 5: Cluster Three
Extreme High Values: Separated, LoneParent, RentPublic, Unemployed, Manufacturing.
Extreme Low Values: NonDependChildren, JTWPublic.
33
Figure 7: Rader plot of Cluster Three
Table 5: Sub-districts in Cluster Three
34
Cluster Four: Young labouring families in outer commuter areas
This cluster consists of 56 Sub-districts serving 361,375 mainly Irish people.
These areas may be recently built towns and villages with good connectivity. These areas
contrast very low numbers of older people and those with grown up children with high
numbers of families with children aged 0-14. Above average levels of car ownership and
employment combined with below average agriculture workers indicate that those who do
work may commute.
Map 6: Cluster Four
Extreme High Values: Age0-4, Age5-14.
Extreme low Values: Age65+, Pensioner, NonDependChildren.
35
Figure 8: Rader plot of Cluster Four
Table 6: Sub-districts in Cluster Four
36
Cluster Five: Settled older rural communities
This cluster consists of 15 Sub-districts serving 48,067 people. These areas have
an aging population living in older housing stock with extreme levels of housing with no
central heating. There is also an above average amount of septic tanks in use. There are
low levels of unemployment but high levels of those who are unpaid carers, and above
average levels of agricultural working.
Map 7: Cluster Five
Extreme High Values: Age45-64, Age65+, EUNat, Separated, Pensioner,
NoCentralHeat, HomeWork, UnpaidCare.
Extreme Low Values: Unemployed, Manufacturing, Public, Professional.
37
Figure 9: Rader plot of Cluster Five
Table 7: Sub-districts in Cluster Five
38
Cluster Six: Urban city peripheral communities
This cluster consists of 27 Sub-districts serving 730,183 people. It is characterised
by low instances of septic tanks and very low car ownership contrasted with high numbers of
lone parents and below average employment. The number of students is high due to the proximity
to university sites in commutable distances. The instances of pensioners is also high, however the
high numbers of students and local authority renters indicates a semi fluid community.
Connectivity is good in keeping with the proximity to city centres, as is the use of public transport
for work journeys.
Map 8: Cluster Six
Extreme High Values: Pensioner, LoneParent, RentPublic, JTWPublic, Students,
Commerce, Transport, Professional, Broadband.
Extreme Low Values: SepticTank, TwoCars, HomeWork, Agric, Construction.
39
Figure 10: Rader plot of Cluster Six
Table 8: Sub-districts in Cluster Six
40
Cluster Seven: Struggling rural aging communities
This cluster consists of 23 Sub-districts serving 50,648 people. The areas are aging
and rural in nature. There are high levels of unemployment and unpaid caring. There are
few children and few persons with a higher education. These communities may have
suffered from out migration as indicated by the very low levels of 25-44 year olds.
Map 9: Cluster Seven
Extreme High Values: Age 45-64, Age65+, Pensioner, LoneParent,
NonDependChildren, SepticTank, JTWPublic, LLTI, UnpaidCare, Unemployed,
Construction, Professional.
Extreme Low Values: Age0-4, Age25-44, HEQual, Employed, Manufacturing,
Commerce, Internet.
41
Figure 11: Rader plot of Cluster Seven
Table 9: Sub-districts in Cluster Seven
42
Cluster Eight: Rural farming communities
This cluster consists of 43 Sub-districts serving 118,173 people. The areas tend to
have high levels of non dependent children. Perhaps these adult children live on the
family farm as the level of workers in agriculture is also extremely high. Connectivity is
poor in terms of internet or broadband. Those who do work, tend not to be employed in
the public or professional sectors. Instances of private rent and flats are low, indicating a
stable community.
Map 10: Cluster Eight
Extreme High Values: NonDependChildren, Agric.
Extreme Low Values: Professional, Internet.
43
Figure 12: Radar plot of Cluster Eight
Table 10: Sub-districts in Cluster Eight
44
Cluster Nine: Small rural Townlands
This cluster consists of 53 Sub-districts serving 229,825 people. The areas tend to
employ people in the public sector; however unemployment in general is high. These
areas have below average number of flats and a population that is beginning to age. They
are near larger population centres so that what internet there is tends to be broadband,
however internet connectivity in general is poor.
Map 11: Cluster Nine
Extreme High Values: Public.
Extreme Low Values: None.
45
Figure 13: Radar plot of Cluster Nine
Table 11: Sub-districts in Cluster Nine
46
Cluster Ten: Labouring rural communities in older housing stock
This cluster consists of 21 Sub-districts serving 60,960 people. The numbers of
people working in agriculture is high, as is construction work. Significant portions of the
population are self employed (HomeWork). The housing stock is older in these areas as
indicated by the above average level of septic tank use combined with the extremely high
levels of housing with no central heating. There are few flats and rented accommodation
in these areas indicating a settled rural community.
Map 12: Cluster Ten
Extreme High Values: NoCentralHeat, HomeWork, Agric, Construction.
Extreme Low Values: Public.
47
Figure 14: Radar plot of Cluster Ten
Table 12: Sub-districts in Cluster Ten
48
Cluster Eleven: Young educated commuter families
This cluster consists of 29 Sub-districts serving 750,014 people. These areas have
very high instances of 0-4 year olds and 25-45 year olds contrasting with very low
instances of people aged 45 and over. These areas may include significant numbers of
new estates indicated by the low levels of non central heated homes, above average
number of flats and high instances of internet and broadband. These are commuter areas
for a highly educated workforce employed primarily in transport and commerce.
Map 13: Cluster Eleven
Extreme High Values: Age0-4, Age25-44, PeoplePerRoom, HEQual, Employed,
Commerce, Transport, Broadband, Internet.
Extreme Low Values: Age45-64, Age65+, Pensioner, NonDependChildren,
NoCentralHeat, SepticTank, LLTI, UnpaidCare, Unemployed, Agric.
49
Figure 15: Radar plot of Cluster Eleven
Table 13: Sub-districts in Cluster Eleven
50
Cluster Twelve: Comfortable rural farming communities
This cluster consists of 40 Sub-districts serving 94,434 people. The areas are very
similar to those in Cluster Eight ‘Rural farming communities’ described on page 42. The
main difference being that employment in these areas is split between agriculture and the
public sector. The fact that these areas have very high levels of public sector employment
may account for above average numbers of 5-14 year olds. Internet connectivity is poor
and the community in general is a settled Irish one. Above average instances of two or
more car households may be an indicator of wealth.
Map 14: Cluster Twelve
Extreme High Values: NonDependChildren, SepticTank, Agric, Public.
Extreme Low Values: EUNat, BornOutsideIRE, Separated, Broadband.
51
Figure 16: Radar plot of Cluster Twelve
Table 14: Sub-districts in Cluster Twelve
52
Cluster Thirteen: Affluent professional commuters in larger homes
This cluster consists of 38 Sub-districts serving 182,970 people. These areas are
populated by healthy professionals. The housing stock is generally larger than average.
There are very few local authority renters and below average numbers of private renters.
Unemployment is low and there are above average instances of persons with a third level
education. The people of these areas are employed in professional services and commute
by car.
Map 15: Cluster Thirteen
Extreme High Values: RoomsHH, TwoCars, Professional.
Extreme Low Values: Separated, LoneParent, RentPublic, LLTI, Unemployed.
53
Figure 17: Radar plot of Cluster Thirteen
Table 15: Sub-districts in Cluster Thirteen
54
Cluster Fourteen: Semi rural periphery manufacturing communities
This cluster consists of 84 Sub-districts serving 381,309 people. This cluster could
be described as consisting of those areas that didn’t fit in to any other cluster. Only one
variable (manufacturing) in these areas is considered extreme, and that is only just. These
areas are close enough to major population areas to be influenced by them, but possibly
too far away to be considered commuter areas. There are above average levels of
agricultural and construction work. There are also above average instances of
unemployment and of staying at home to look after children.
Map 16: Cluster Fourteen
Extreme High Values: Manufacturing.
Extreme Low Values: None.
55
Figure 18: Radar plot of Cluster Fourteen
Table 16: Sub-districts in Cluster Fourteen
56
The Urban Rural Divide
The clusters as described on the previous pages can be loosely defined as having
either rural or urban tendencies. Clusters 1, 5, 7, 8, 10, 12 and 14 can be said to exhibit
rural tendencies. Clusters 2, 3, 4, 6, 9, 11 and 13 can be said to exhibit urban tendencies.
Map 17: The Urban Rural Divide
57
In order to give these tendencies a visual test, a final map in this section was
created. This map visualises urban clusters as blue, and rural clusters as yellow.
Settlement data were downloaded (Central Statistics Office, 2014a) and also visualised on
the map. The map indicates that the allocation of urban or rural in the description phase
was successful. It is however a rough and ready indicator and says nothing about the
validity of the allocation of Sub-districts to individual clusters.
58
Crime Atlas
Set Up
The classification exercise is complete and the resulting clusters have been
mapped and described. These clusters will now be used in a basic crime atlas of Ireland.
Each of the crime types reported at Sub-district level will be mapped. These maps will be
choropleth maps of Ireland showing the number of crimes per 1000 of population along
with a map of the change in recorded crime between 2014 and 2015 in absolute numbers.
Additionally, box plots of the crime rates within each cluster will be produced;
these will show the number of crimes per 1000 of population. It is anticipated that the
intra-cluster variance will be small due to the clustering exercise. It is further anticipated
that there will be different patterns of inter-cluster variance for each crime type due to the
propensity for different crime types to occur in different areas. For each crime type the
clusters will be subjected to a simple liner regression, this is to test if the crime rates in
each cluster are statistically significantly different from the national rate for that crime.
The box plots and regressions were created in R. For each crime type a new
variable was created that returned the number of crimes per 1000 people in each Sub-
district. The national rate was also calculated and a box plot created for each cluster. For
the regression another variable was required representing the number of crimes per 1000
people minus the national figure. This was then subjected to a liner regression with the
cluster numbers as dummy variables. As an example, the procedure for kidnapping related
crimes is shown on the next page.
59
yLabel<-"Recorded Crimes per 1000 Population"
Crimes$Kid <-(Crimes$KID_2015/Crimes$Total2011)*1000
KidTot <- (sum(Crimes$KID_2015)/sum(Crimes$Total2011))*1000
boxplot(Crimes$Kid ~ Crimes$cluster,
main="Kidnapping Related Crime 2015",xlab="Cluster", ylab=
yLabel)
abline(h=KidTot, lty=2, col='red')
legend("top", legend=("National Rate"), lty=2,col="red",
bty="n")
Crimes$KidDev<-Crimes$Kid-KidTot
KidMod <- lm(KidDev~as.factor(cluster)+0, data=Crimes)
Summary(KidMod)
The results for kidnapping related offences show that there is very little variance
within clusters. Kidnapping is a rare crime to be recorded. In 2015, only 155 kidnapping
related crimes were recorded in Ireland. This gives a crime rate of 0.03 crimes per 1000
people nationally. Two of the clusters were significantly different from this; these
differences were driven by outliers in the data. However these outliers should not be
ignored as they say something about the crimes. They will not be discussed here however
as each crime type will be discussed from the following page. The clusters represented in
the box plots will also be coloured based on their tendencies identified in the clustering
process to be urban (blue) or rural (yellow) in nature. The simple linear model also allows
for the amount of variance in crime rates between Sub-districts that is explained by the
cluster to which they belong, to be expressed as a percentage. This is done using the
adjusted R squared in the model summary.
60
Theft
Map 18: Theft
61
Figure 19: Box plots by Cluster of theft related crime
Nationally the number of theft related crimes per 1000 people in Ireland for 2015
was 16.56. Most of the clusters perform well with low intra cluster variability. Only
clusters three (with a p value of 0.32) and six (with a p value of 0.44) do not differ
significantly from the national figure, with most being below the national figure. The
largest variability can be found in cluster two; this cluster is driving up the national figure
with rates in its Sub-districts ranging from 24.46 to 48.43 per 1000 of population. Then
there is a jump for Anglesea Street in Cork to 136.54, the two outliers are Store Street and
Pearse Street Sub-districts in central Dublin with 280.21 and 296.34 respectively. The
other notable outlier is that of the Dublin Airport Sub-district in cluster eleven, with theft
related crime rate of 422.6 per 1000 of population. The classification accounts for 27.5%
of the national variance in theft related crime at the Sub-district level.
62
Assault
Map 19: Assault
63
Figure 20: Box plots by Cluster of assault related crime
Nationally the number of assault related crimes per 1000 of population in Ireland
in 2015 was 3.7. The variation with the clusters may seem to be higher for assault rather
than theft. However the scale is smaller with a maximum value of 33.27. Again cluster
two has the most variability ranging from 2 to 33.27. All clusters can be said to be
significantly different from the national figure, except clusters five (p=0.26), six (p=0.94),
nine (p=0.52) and eleven (p=0.14). Notable outliers in regards to assault related crime are
Pearse Street, Dublin in Cluster two (33.27), Bridewell, Cork in cluster six (12),
Castlerea, Longford in cluster nine (21), and Dublin Airport in cluster eleven (22). The
classification accounts for 35.6% of the national variance in assault related crime.
64
Burglary
Map 20: Burglary
65
Figure 21: Box plots by Cluster of burglary related crime
Nationally the number of burglary related crimes per 1000 of population in Ireland
in 2015 was 5.7. The inter and intra cluster variance are quite large. Perhaps reflecting the
sporadic nature of burglaries where areas may be hit with burglaries for a time before
criminals move on to the next area. All clusters can be said to be significantly different
from the national figure, except clusters three (p=0.46) and eleven (p=0.32). There are
notable outliers in regards to burglary related crime. Pearse Street, Dublin in Cluster two
has a rate of 26.5 whereas the cluster has a range of 7.09 to 12.56 otherwise. Clondalkin
and Courtown Harbour are the outliers for cluster three, Belturbert in the
Cavan/Monaghan division is the large outlier in cluster nine, with a rate of 16.08 crimes
per 1000 of population. Dundrum and Rathcoole in cluster eleven and Mountrath in
cluster fourteen pull up their clusters results. The classification accounts for 54.7% of the
national variance in assault related crime.
66
Damage to Property or the Environment
Map 21: Damage to property or the environment
67
Figure 22: Box plots by Cluster of damage related crime
Nationally the number of damage related crimes per 1000 of population in Ireland in 2015
was 5.68. The intra cluster variation appears to be relatively stable across the
classification, with the exceptions of clusters two, three and six. Again cluster two has the
most variability ranging from less than 5 in Irishtown and Donnybrook to more than 24 in
Pearse Street and Store Street. All clusters can be said to be significantly different from
the national figure, except cluster eleven (p=0.27). The large outlier in cluster eleven is
Dublin Airport. In 2015 the Sub-district covering the airport recorded 13 damage related
crimes, however due to the low population this equates to 31.94 crimes per 1000 of
population. The classification accounts for 58.4% of the national variance in damage
related crime.
68
Dangerous Acts
Map 22: Dangerous Acts
69
Figure 23: Box plots by Cluster of crimes relating to dangerous acts
Nationally the number of crime recorded relating to dangerous acts per 1000 of
population in Ireland in 2015 was 1.57. The variation between clusters and within clusters
is small, ranging between 0 and 6. Dublin Airport is again the exception with a figure of
15 crimes per 1000 of population. All clusters can be said to be significantly different
from the national figure, except clusters five (p=0.46), nine (p=0.32) and eleven (p=0.94).
The classification accounts for 15.2% of the national variance in recorded dangerous acts
related crime.
70
Drugs
Map 23: Drugs
71
Figure 24: Box plots by Cluster of drug related crime
Nationally the number of drug related crimes per 1000 of population in Ireland in
2015 was 3.29. All clusters can be said to be significantly different from the national
figure, except clusters three (p=0.65) and eleven (p=0.63). Drug crime is generally low
with some larger outliers dragging up the figures. The city centre Sub-districts and city
periphery Sub-districts of clusters two and six show the largest number of crimes and the
largest intra cluster variation. In cluster two the largest rate was recorded in Store Street in
Dublin with 43.22, the nearest rate to this in the cluster was Pearse Street at 23.65. The
large outlier in cluster eleven was again Dublin Airport. An interesting result can be seen
cluster twelve. Stradbally in county Laois which hosts concerts and summer festivals has
a rate of 47.95, the largest in the country. The rate in Stradbally dwarfs the nearest rate of
1.61. The classification accounts for 29.3% of the national variance in drug related crime.
72
Fraud
Map 24: Fraud
73
Figure 25: Box plots by Cluster of fraud related crime
Nationally the number of fraud related crimes per 1000 of population in Ireland in
2015 was 1.22. The crime rates across the clusters are small and the range within the
clusters is also small. Again cluster two has the most variability. Cluster two is dragged
up by two Garda Sub-districts. Pearse Street (16.5) which covers many white collar areas,
including the Central Bank and Store Street (13.1) which covers areas such as the IFSC.
All clusters can be said to be significantly different from the national figure, except
clusters three (p=0.81), five (p=0.22) and six (p=0.77). The large outlier in cluster eleven
is again Dublin Airport with 49.14 crimes per 1000 population. The classification
accounts for 14% of the national variance in fraud related crime.
74
Kidnapping
Map 25: Kidnapping
75
Figure 26: Box plots by Cluster of kidnapping related crime
Nationally the number of kidnapping related crimes per 1000 of population in
Ireland in 2015 was 0.03. This figure represents a total of 155 recorded crimes across the
country. Only two of the clusters differ significantly from the national figure. These are
cluster seven and cluster eleven. These in turn were driven by one outlier each. In cluster
seven Ballycroy in Mayo recorded 6 kidnapping related crimes, up 6 from 2014, giving a
rate of 10.4. Dublin Airport Sub-district recorded 4 crimes, down 2 on 2014, giving a rate
of 9.8. Due to the fact that the instances of kidnapping related crime being recorded
nationwide were so low the classification accounts for only 1% of the national variance.
76
Public Order
Map 26: Public Order
77
Figure 27: Box plots by Cluster of public order and social code crime
Nationally the number of public order related crimes per 1000 of population in
Ireland in 2015 was 7.25. Most clusters fall below this figure. Again cluster two has the
most variability ranging from 4.94 in Irishtown to 132.83 in Pearse Street and 152.41 in
Store Street. Five clusters are not significantly different from the national figure; clusters
three (p=0.14), five (p=0.46), six (p=0.86), nine (p=.015), and eleven (p=0.60). The large
outlier in cluster eleven is again Dublin Airport with 169.53 crimes per 1000 population.
The classification accounts for 25.2% of the national variance in public order related
crime.
78
Robbery
Map 27: Robbery
79
Figure 28: Box plots by Cluster of robbery related crime
Nationally the number of robbery related crimes per 1000 of population in Ireland
in 2015 was 0.56. The crime rates across the clusters are small and the range within the
clusters is generally small. Nationally the figure rarely gets above 2. In cluster two Pearse
Street and Store Street again drag up the rate. In cluster three Tallaght crosses the 2
crimes per 1000 of population level at 2.2. In cluster six Ballymun drives the rate at 3.46.
For robbery related crimes recorded all the clusters are significantly different from the
national figure with p values ranging from <2e-16 to 0.035859. The classification
accounts for 58.7% of the national variance in robbery related crime.
80
Weapons
Map 28: Weapons
81
Figure 29: Box plots by Cluster of weapons related crime
Nationally the number of weapons related crimes per 1000 of population in
Ireland in 2015 was 0.52. The crime rates vary very little across and within clusters.
Again cluster two has the most variability, and again the two Sub-districts dragging up
cluster two are Store Street (5.04) and Pearse Street (6.29). Only five clusters can be said
to be significantly different from the national figure. These are clusters one (p=0.038),
two (p=0.0002), eleven (p=0.0002), twelve (p=0.055) and thirteen (p=0.063). The large
outlier in cluster eleven is again Dublin Airport with 34.4 crimes per 1000 population.
The classification accounts for only 6% of the national variance in weapons related crime.
82
Offences against the State, Justice or Organised Crime
Map 29: State, Justice or Organised Crime
83
Figure 30: Box plots by Cluster of offences against the State, justice procedures or organised crime
Nationally the number of fraud related crimes per 1000 of population in Ireland in
2015 was 2.41. The crime rates across the clusters are small and the range within the
clusters is also small. The outliers are unsurprisingly driven by proximity to major Irish
Courts Service premises. Examples include Bridewell Sub-district in Dublin (153.13)
which covers the Four Courts and Anglesea Street in Cork (33.01) which covers Cork
District Court. Limerick also features in outliers. In cluster three, Newcastlewest and
Roxboro Road of Limerick are the two highest outliers at 6.87 and 8.74 respectively. In
cluster six, Henry Street (15.79) and Manorstone Park (11.36) are also in Limerick. The
classification accounts for 14% of the national variance in recorded crimes against the
State, justice procedures or organised crime.
84
Crime Atlas Comments
The classification has allowed for another aspect of crime figures to be compared.
It has assisted in identifying those Garda Sub-districts which have high crime figures
compared to other Sub-districts that have similar underlying geodemographics. It has also
helped identify Sub-districts that may benefit from being considered separately. Of the
twelve crime types shown Store Street was identified as an outlier eight times. Pearse
Street was an outlier nine times, and Dublin Airport was an outlier ten times.
The classification accounted for variation in crime rates with varying degrees of
success as can be seen in the summary below in figure 31. However this variety should
not be seen as a failure of the classification. All crimes are not created equal, just as all
Sub-districts are not created equal. The classification is a general one and can be used for
many other purposes. For example the classification can be used to explore if station
opening times are appropriate for the population make up of Sub-districts. It can also be
used to show the balance of Sub-districts that each area commander has under their
control, for assistance in funding requests etc.
Figure 31: Variance in crime rates explained by the clustering classification
85
Issues with Data and Analysis
Issues with the data used have generally been described and explored as they have
arisen during the literature review, data and building the classification sections of this
thesis. However a number of other issues have been identified, these are discussed below.
The clustering exercise described throughout this thesis takes in to account a lot of
the information available and recommended by the FBI (2012). It should be noted that not
all Sub-districts are created equal. Not included in the classification are several sources of
information that may help compare Garda Sub-districts; opening hours, number of staff
and their roles, proximity to major roads, prison populations’ etc. Incorporating more of
these variables may have made the classification more accurate for crime, but perhaps too
specialised.
However this is a general classification that could have many uses, rather than a
specific crime comparison tool. The classification was also designed to be comparable to
other such general classifications created previously, such as the ones by Brunsdon et al.
(2014) or Charlton et al. (1985). As such, not all the available data were included.
The dominance of Dublin city centre is evident throughout the classification;
cluster two has the largest variance of any type of crime by far. Out of the twelve crime
types explored, Pearse Street and Store Street appear as outliers nine and eight times
respectively. In cluster eleven, Dublin Airport is a large outlier in ten of the crime types.
This is unsurprising due to the small settled population (there on census night), large
transient daily population and increased scrutiny as befits an international airport.
Removing Dublin Airport Sub-district as an anomaly may have been an option that
wouldn’t adversely affect the classification with its absence. However I chose to keep it in
for just that reason. The fact that it is an anomaly (and a vital national resource), the
86
classification results and crime figures may be another argument for more policing
resources than may be justified by population figures alone.
The reader and further researcher will have to draw their own conclusion, but it
may make more sense to carry out the classification again, leaving out the Dublin
Metropolitan Region (DMR) Sub-districts. Classifying the DMR separately would
effectively give a two tier classification, one for Dublin and one for the rest of the
country. This may allow a more nuanced view of Dublin Sub-districts and a more
accurate view of the rest of the country than would be possible to get together.
It should also be noted again that the cluster descriptions can only be a snapshot,
an overall picture of an area or areas. Not everyone in cluster two is going to be young,
affluent and single for example. The cluster descriptions are helpful to the reader but no
classification can fully express the massive variability between people or the uniqueness
of people and families.
Lastly the age of the data should be considered by the reader. The Garda Sub-
districts were assigned in 2013 and the census data were collected in 2011. The census
data is currently collected every five years, which will allow for an update to this work in
approximately 2017 (when the data are released). The opening and closing of Garda
stations and the subsequent changing of Sub-district boundaries can be subject to political
and administrative whim. So there is no guarantee that the same Sub-districts used in the
analysis will be in operation at the time of reading.
87
Conclusion
The Federal Bureau of Investigation (2012) warns against comparing areas on
their crime figures without understanding the underlying reasons for the differences
across jurisdictions.
Geographic and demographic factors specific to each jurisdiction must be
considered and applied if one is going to make an accurate and complete
assessment of crime in that jurisdiction.... The transience of the population, its
racial and ethnic makeup, its composition by age and gender, education levels, and
prevalent family structures are all key factors in assessing and comprehending the
crime issue. (Federal Bureau of Investigation, 2012).
The aim of this thesis was to create a geodemographic classification of Ireland at
the Garda Sub-district level. The classification was to be one that could be used by An
Garda Síochána management, policy makers and other interested parties. The
classification would allow comparison of Garda Sub-districts based on their underlying
socio demographic characteristics rather than simply their physical locations in the
hierarchy of An Garda Síochána. In this regard, the classification has been a success.
The classification has the potential to be used or updated for many purposes. As
an example of this the crime atlas section was created. It gives a general overview of
crime and allows for the identification of outliers within clusters of Sub-districts with
similar geodemographic characteristics for further investigation.
The classification is far from perfect and the results must come with a caveat.
Opening hours, staffing and equipment, station budgets and public attitudes are not taken
in to account, all of which can affect crime levels (Federal Bureau of Investigation, 2012)
and influence results in crime comparisons. However the classification can assist here too.
It allows those with data on staffing and assets, or with control over opening hours to
88
investigate whether Sub-districts may require additional resources as befitting their
geodemographics, based on their classification.
The classification that was created and described in this thesis is not an end
product, but rather a tool in the analytical toolbox to assist An Garda Síochána and
interested parties. The classification can inform decision making, or assist in explaining
differences between Sub-districts in a simpler way than going in to all the detail that the
clusters represent. When data are available from the 2016 census it is hoped that this work
can inform a reclassification based on more up to date data. As well as being an updated
version of this research, it would allow for a temporal comparison of Garda Sub-district
geodemographic make up, should the clustering produce different results as anticipated.
89
Bibliography
Abbas, J., Ojo, A. & Orange, S., 2009. Geodemographics - a tool for health intelligence?. Public
Health, 123(1), pp. 35-39.
Alexiou, A. & Singleton, A., 2015. Geodemographic Analysis. In: C. Brunsdon & A. Singleton, eds.
Geocomputation a Practical Primer. London: Sage Publications Ltd, p. 137.
An Garda Siochana, 2013. Annual Policing Plan 2013. [Online]
Available at:
http://garda.ie/Documents/User/Annual%20Policing%20plan%202013%20%20English.pdf
[Accessed 1 April 2016].
Anderson, D. et al., 2010. Statistics for Business and Economics. 2nd ed. Andover, Hampshire:
Cengage Learning EMEA.
Ashby, D. & Longley, P., 2005. Geocomputation, Geodemographics and Resource Allocation for
Local Policing. Transactions in GIS, 9(1), pp. 53-72.
Booth, C., 1903. Life and Labour of the People in London. New York: MacMillan and Co.
Boyne, R., 2006. Classification. Theory, Culture & Society, 23(2-3), pp. 21-30.
Brunsdon, C., Rigby, J. & Charlton, M., 2014. Ireland Census of Population 2011: A Classification
of Small Areas. [Online]
Available at: https://rpubs.com/chrisbrunsdon/14998
[Accessed 4 February 2016].
Central Statistics Office, 2014a. Census 2011 Boundary Files. [Online]
Available at: http://www.cso.ie/en/census/census2011boundaryfiles/
[Accessed 3 February 2016].
Central Statistics Office, 2014. Census 2011 Small Area Population Statistics (SAPS). [Online]
Available at: http://www.cso.ie/en/census/census2011smallareapopulationstatisticssaps/
[Accessed 4 February 2016].
Central Statistics Office, 2016. StatBank Ireland: Recorded Crime. [Online]
Available at:
http://www.cso.ie/px/pxeirestat/DATABASE/Eirestat/Recorded%20Crime/Recorded%20Crime_s
tatbank.asp?sp=Recorded%20Crime&Planguage=0
[Accessed 1 April 2016].
Charlton, M. & Brunsdon, C., 2016. Gehlke and Biehl Revisited. Greenwich: GISRUK 2016.
Charlton, M., Openshaw, S. & Wymer, C., 1985. Some New Classifications of Census Enumeration
Districts in Britain: A Poor Man's ACORN. journal of Economic and Social Measurement, Volume
13, pp. 69-96.
Creaner, P., 2016. Examiner of Maps (GIS) at An Garda Siochana [Interview] (8 April 2016).
90
Debenham, J., 2002. Understanding Geodemographic Classification: Creating The Building Blocks
For An Extension. [Online]
Available at: http://eprints.whiterose.ac.uk/5014/
[Accessed 1 April 2016].
Ding, C. & He, X., 2004. K-means Clustering via Principal Component Analysis. Proceedings of the
twenty-first international conference on Machine learning, p. 29.
Doyle, C., 2011. A Dictonary of Marketing. 3rd ed. Oxford: Oxford University Press.
Dupre, J., 2006. Scientific Classification. Theory, Cluture & Society, 23(2-3), pp. 30-32.
Esri, 2009. Cartogram Geoprocessing Tool version 2. [Online]
Available at: http://arcscripts.esri.com/details.asp?dbid=15638
[Accessed 3 March 2016].
Federal Bureau of Investigation, 2012. Caution against Ranking. [Online]
Available at: https://www.fbi.gov/about-us/cjis.ucr/cautionagainstranking.pdf
[Accessed 22 April 2016].
Gale, C., Singleton, A. & Longley, P., 2015. Profiling Burglary in London using Geodemographics.
Leeds, GISRUK2015.
Gehlke, C. & Biehl, K., 1934. Certian Effects of Grouping Upon the Size of the Correlation
Coefficient in Census Tract Material. Journal of the American Statistical Association, 29(185), pp.
169-170.
Gordon, A., 1987. A Review of Hierarchical Classification. Journal of the Royal Statistical Society.
Series A, 150(2), pp. 119-137.
Government Publications, 2016. Bunreacht Na hEireann : Constitution of Ireland. 2016 ed.
Dublin: Government Publications Office.
Hansen, P. & Jaumard, B., 1997. Cluster analysis and mathmatical programming. Mathmatical
Programming, Volume 79, pp. 191-215.
Harris, R., Sleight, P. & Webber, R., 2005. Geodemographics, GIS and Neighbourhood Targeting.
Chichester: John Wiley & Sons Ltd.
Houses of the Oireachtas, 2013. Electoral (amendment) (Dail Constituencies) Act 2013. [Online]
Available at: http://www.irishstatutebook.ie/eli/2013/act/7/enacted/en/print#sec6
[Accessed 6 April 2016].
Jackson, J., 1991. A User's Guide to Principal Components. New yourk: Hohn Wiley & Sons, Inc.
Jolliffe, I., 2002. Principal Component Analysis. 2nd ed. London: Springer.
91
Jordan, A., 2015. Ireland's violent crime capitals: We reveal which counties have the highest
numbers of rape and murder suspects. [Online]
Available at: http://www.irishmirror.ie/news/irish-news/crime/irelands-violent-crime-capitals-
reveal-5245071
[Accessed 4 April 2016].
London School of Economics and Political Science , 2012. Charles Booth Online Archive: The
Survey into life and labour in London (1886-1903). [Online]
Available at: http://booth.lse.ac.uk/static/b/index.html
[Accessed 5 April 2016].
MacCarthaigh, S. & Phelan, S., 2014. Crime Nation: How safe is your area?. [Online]
Available at: http://www.independent.ie/irish-news/crime/crime-nation-how-safe-is-your-area-
30727076.html
[Accessed 4 April 2016].
Openshaw, S. & Taylor, P., 1979. A Million or so correlation coefficients, three experiments on
the modifiable areal unit problem. Statistical Applications in the Spatial Sciences, Volume 127.
Parker, S., Uprichard, E. & Burrows, R., 2007. CLASS PLACES AND PLACE CLASSES
Geodemographics and the spatialization of class. Information, Communication and Society, 10(6),
pp. 902-921.
Perec, G., 1989. Penser/Classer. Paris: Hachette.
Rencher, A. & Christensen, W., 2012. Methods of Multivate Analysis. 3rd ed. New Jersey: John
Wiley & Sons Inc.
Singleton, A. & Longley, P., 2008. Creating open source geodemographics: Refining a national
classification of census output areas for applications in higher education. Papers in Regional
Science, 88(3), pp. 643-666.
Singleton, A. & Longley, P., 2009. Geodemographics, visualisation, and social networks in applied
geography. Applied Geography, Volume 29, pp. 289-298.
The R Foundation, 2015. The R Project for Statistical Computing. [Online]
Available at: https://www.r-project.org/
[Accessed September 2015].
Tobler, W., 1970. A Computer Movie Simulating Urban Growth in the Detroit Region. Economic
Geography, Volume 46, p. 236.
Vickers, D. & Rees, P., 2007. Creating the UK National Statistics 2001 Output Area Classification.
Journal of the Royal Statistical Society. Series A (Statistics in Society), 170(2), pp. 379-403.
Vickers, D. & Rees, P., 2011. Ground-truthing Geodemographics. Applied Spatial Analysis,
Volume 4, pp. 3-21.
GaryFRussell
GaryFRussell
GaryFRussell
GaryFRussell
GaryFRussell
GaryFRussell
GaryFRussell
GaryFRussell
GaryFRussell
GaryFRussell
GaryFRussell
GaryFRussell
GaryFRussell
GaryFRussell
GaryFRussell
GaryFRussell
GaryFRussell
GaryFRussell
GaryFRussell
GaryFRussell
GaryFRussell
GaryFRussell
GaryFRussell
GaryFRussell

More Related Content

Viewers also liked

Cielo Rocio Feria Gil
Cielo Rocio Feria Gil	Cielo Rocio Feria Gil
Cielo Rocio Feria Gil astrydquintero
 
chapter 3 ethics: computer and internet crime
chapter 3 ethics: computer and internet crimechapter 3 ethics: computer and internet crime
chapter 3 ethics: computer and internet crimemuhammad awais
 
Gloria Ines Suarez Martinez
Gloria Ines Suarez Martinez	Gloria Ines Suarez Martinez
Gloria Ines Suarez Martinez astrydquintero
 
Uma aventura em africa
Uma aventura em africaUma aventura em africa
Uma aventura em africaPortugal
 
Cortina de ar horizontal para refrigeradores
Cortina de ar horizontal para refrigeradoresCortina de ar horizontal para refrigeradores
Cortina de ar horizontal para refrigeradoresjulioafobarros
 
Newsletter CSC 120, 6 de março 2015
Newsletter CSC 120, 6 de março  2015Newsletter CSC 120, 6 de março  2015
Newsletter CSC 120, 6 de março 2015Rui de Almeida
 
Slide de direitos humanos6
Slide de direitos humanos6Slide de direitos humanos6
Slide de direitos humanos6vanessatpaulino
 

Viewers also liked (11)

Cielo Rocio Feria Gil
Cielo Rocio Feria Gil	Cielo Rocio Feria Gil
Cielo Rocio Feria Gil
 
MARY- CV
MARY- CVMARY- CV
MARY- CV
 
chapter 3 ethics: computer and internet crime
chapter 3 ethics: computer and internet crimechapter 3 ethics: computer and internet crime
chapter 3 ethics: computer and internet crime
 
Martyrs (2.0)
Martyrs (2.0)Martyrs (2.0)
Martyrs (2.0)
 
Book Cover Evaluation
Book Cover EvaluationBook Cover Evaluation
Book Cover Evaluation
 
Gloria Ines Suarez Martinez
Gloria Ines Suarez Martinez	Gloria Ines Suarez Martinez
Gloria Ines Suarez Martinez
 
Uma aventura em africa
Uma aventura em africaUma aventura em africa
Uma aventura em africa
 
certified documents
certified documentscertified documents
certified documents
 
Cortina de ar horizontal para refrigeradores
Cortina de ar horizontal para refrigeradoresCortina de ar horizontal para refrigeradores
Cortina de ar horizontal para refrigeradores
 
Newsletter CSC 120, 6 de março 2015
Newsletter CSC 120, 6 de março  2015Newsletter CSC 120, 6 de março  2015
Newsletter CSC 120, 6 de março 2015
 
Slide de direitos humanos6
Slide de direitos humanos6Slide de direitos humanos6
Slide de direitos humanos6
 

Similar to GaryFRussell

LinkingCorpStrategy _ 31_12_2014_Final
LinkingCorpStrategy _ 31_12_2014_FinalLinkingCorpStrategy _ 31_12_2014_Final
LinkingCorpStrategy _ 31_12_2014_Finalluluqa
 
A systematic review_of_internet_banking
A systematic review_of_internet_bankingA systematic review_of_internet_banking
A systematic review_of_internet_bankingsaali5984
 
File239c6b31a715453d954cb265c3d568d7 libre
File239c6b31a715453d954cb265c3d568d7 libreFile239c6b31a715453d954cb265c3d568d7 libre
File239c6b31a715453d954cb265c3d568d7 libreJoão Pereira Neto
 
Innovation and Diversification Policies for Natural R
Innovation and Diversification Policies  for Natural RInnovation and Diversification Policies  for Natural R
Innovation and Diversification Policies for Natural RLaticiaGrissomzz
 
adambrann_i7714566
adambrann_i7714566adambrann_i7714566
adambrann_i7714566Adam Brann
 
KASPARIAN_MSRS THESIS_Suitabiltiy Analysis of Almond Production under a Chang...
KASPARIAN_MSRS THESIS_Suitabiltiy Analysis of Almond Production under a Chang...KASPARIAN_MSRS THESIS_Suitabiltiy Analysis of Almond Production under a Chang...
KASPARIAN_MSRS THESIS_Suitabiltiy Analysis of Almond Production under a Chang...Arpy Kasparian
 
Visualization hang zhong
Visualization hang zhongVisualization hang zhong
Visualization hang zhongray4hz
 
ZSSS_End of Project Evaluation Report
ZSSS_End of Project Evaluation ReportZSSS_End of Project Evaluation Report
ZSSS_End of Project Evaluation ReportClaudios Hakuna
 
Get started! (MRC Count Captains Initiative Manual)
Get started! (MRC Count Captains Initiative Manual) Get started! (MRC Count Captains Initiative Manual)
Get started! (MRC Count Captains Initiative Manual) Mackenzie Jarvis
 
High Fidelity Simulation in Nursing School(partial). Tiana Gray, MSN, RN
High Fidelity Simulation in Nursing School(partial). Tiana Gray, MSN, RNHigh Fidelity Simulation in Nursing School(partial). Tiana Gray, MSN, RN
High Fidelity Simulation in Nursing School(partial). Tiana Gray, MSN, RNTiana Gray
 
Honours (Undergraduate) Research Paper
Honours (Undergraduate) Research PaperHonours (Undergraduate) Research Paper
Honours (Undergraduate) Research PaperSAMUEL NGEI
 
Plachkov_Alex_2016_thesis
Plachkov_Alex_2016_thesisPlachkov_Alex_2016_thesis
Plachkov_Alex_2016_thesisAlex Plachkov
 

Similar to GaryFRussell (20)

LinkingCorpStrategy _ 31_12_2014_Final
LinkingCorpStrategy _ 31_12_2014_FinalLinkingCorpStrategy _ 31_12_2014_Final
LinkingCorpStrategy _ 31_12_2014_Final
 
A systematic review_of_internet_banking
A systematic review_of_internet_bankingA systematic review_of_internet_banking
A systematic review_of_internet_banking
 
Quinton Thesis Pub
Quinton Thesis PubQuinton Thesis Pub
Quinton Thesis Pub
 
RAND_TR715
RAND_TR715RAND_TR715
RAND_TR715
 
File239c6b31a715453d954cb265c3d568d7 libre
File239c6b31a715453d954cb265c3d568d7 libreFile239c6b31a715453d954cb265c3d568d7 libre
File239c6b31a715453d954cb265c3d568d7 libre
 
eclampsia
eclampsiaeclampsia
eclampsia
 
Innovation and Diversification Policies for Natural R
Innovation and Diversification Policies  for Natural RInnovation and Diversification Policies  for Natural R
Innovation and Diversification Policies for Natural R
 
Addendum-1.docx
Addendum-1.docxAddendum-1.docx
Addendum-1.docx
 
adambrann_i7714566
adambrann_i7714566adambrann_i7714566
adambrann_i7714566
 
KASPARIAN_MSRS THESIS_Suitabiltiy Analysis of Almond Production under a Chang...
KASPARIAN_MSRS THESIS_Suitabiltiy Analysis of Almond Production under a Chang...KASPARIAN_MSRS THESIS_Suitabiltiy Analysis of Almond Production under a Chang...
KASPARIAN_MSRS THESIS_Suitabiltiy Analysis of Almond Production under a Chang...
 
Visualization hang zhong
Visualization hang zhongVisualization hang zhong
Visualization hang zhong
 
Chika_Thesis
Chika_ThesisChika_Thesis
Chika_Thesis
 
Calf scours- final
Calf scours- finalCalf scours- final
Calf scours- final
 
ZSSS_End of Project Evaluation Report
ZSSS_End of Project Evaluation ReportZSSS_End of Project Evaluation Report
ZSSS_End of Project Evaluation Report
 
Get started! (MRC Count Captains Initiative Manual)
Get started! (MRC Count Captains Initiative Manual) Get started! (MRC Count Captains Initiative Manual)
Get started! (MRC Count Captains Initiative Manual)
 
Giuffria_JM_T_2016
Giuffria_JM_T_2016Giuffria_JM_T_2016
Giuffria_JM_T_2016
 
High Fidelity Simulation in Nursing School(partial). Tiana Gray, MSN, RN
High Fidelity Simulation in Nursing School(partial). Tiana Gray, MSN, RNHigh Fidelity Simulation in Nursing School(partial). Tiana Gray, MSN, RN
High Fidelity Simulation in Nursing School(partial). Tiana Gray, MSN, RN
 
PMP-Processes
PMP-ProcessesPMP-Processes
PMP-Processes
 
Honours (Undergraduate) Research Paper
Honours (Undergraduate) Research PaperHonours (Undergraduate) Research Paper
Honours (Undergraduate) Research Paper
 
Plachkov_Alex_2016_thesis
Plachkov_Alex_2016_thesisPlachkov_Alex_2016_thesis
Plachkov_Alex_2016_thesis
 

GaryFRussell

  • 1. A Geodemographic Classification of Ireland at Garda Sub-district Level A Tool for Comparing Sub-districts of An Garda Síochána Gary Russell M.Sc. Geocomputation National Centre for Geocomputation, Maynooth University 2016 Professor Christopher Brunsdon – Head of Department Martin Charlton – Programme Coordinator and Thesis Supervisor
  • 2. i Abstract This thesis uses demographic data from the 2011 census to classify the catchment areas of Garda stations in the Republic of Ireland (referred to as Garda Sub-districts). This was accomplished by subjecting the data at a Garda Sub-district level to principal component analysis and then using clustering techniques on the resulting principal components. Similar (geodemographically speaking) Garda station areas were then visualised using mapping techniques based on Central Statistics Office boundary shape files. The clusters of Garda Sub-districts were named and described using the distribution of the characteristic variables compared to the global mean of each variable. As an example of the use for such data manipulation a mini Atlas of Garda Sub-districts was created by comparing the crime figures of Garda Sub-districts within the resulting clusters and visualising the results. Acknowledgements I would like to acknowledge and thank Professor Chris Brunsdon, Martin Charlton, and Dr Ronan Foley. As well as all the staff and researchers in Maynooth University Departments of Geography and Computer science, the National Centre for Geocomputation and the All Ireland Research Observatory for the inspiration and assistance in completing this thesis. I would also like to thank my three classmates for their support and for putting up with my using them as sounding boards over the past year. Lastly I would like to thank my wife, Siobhan and our two young children, Aoibhínn and Joshua, for their unwavering support and belief that I could complete my B.A. and M.Sc. despite the odd grumpy daddy moment.
  • 3. ii Table of Contents Abstract...................................................................................................................................i Acknowledgements ................................................................................................................i Table of Contents ..................................................................................................................ii List of Figures.......................................................................................................................iv List of Maps...........................................................................................................................v List of Tables........................................................................................................................vi Introduction .........................................................................................................................1 Literature Review................................................................................................................4 General...............................................................................................................................4 Theory................................................................................................................................6 Modifiable Areal Unit Problem .........................................................................................6 Variables ............................................................................................................................8 Selection .........................................................................................................................8 Correlation.....................................................................................................................8 Units ...............................................................................................................................9 Clustering.........................................................................................................................10 Methods........................................................................................................................10 Cluster Description ......................................................................................................11 Data.....................................................................................................................................12 Building the Classification................................................................................................16 Variables ..........................................................................................................................16 Principal Components Analysis.......................................................................................19 Clustering.........................................................................................................................21 Cluster Description and Naming......................................................................................24 Results.................................................................................................................................26 Cluster One: Comfortable home owning young families in semi rural areas..................28 Cluster Two: Young, mobile, affluent, multicultural singles ..........................................30 Cluster Three: Struggling labouring communities...........................................................32 Cluster Four: Young labouring families in outer commuter areas ..................................34 Cluster Five: Settled older rural communities .................................................................36 Cluster Six: Urban city peripheral communities.............................................................38
  • 4. iii Cluster Seven: Struggling rural aging communities........................................................40 Cluster Eight: Rural farming communities......................................................................42 Cluster Nine: Small rural Townlands ..............................................................................44 Cluster Ten: Labouring rural communities in older housing stock .................................46 Cluster Eleven: Young educated commuter families.......................................................48 Cluster Twelve: Comfortable rural farming communities...............................................50 Cluster Thirteen: Affluent professional commuters in larger homes...............................52 Cluster Fourteen: Semi rural periphery manufacturing communities .............................54 The Urban Rural Divide ..................................................................................................56 Crime Atlas ........................................................................................................................58 Set Up ..............................................................................................................................58 Theft.................................................................................................................................60 Assault .............................................................................................................................62 Burglary ...........................................................................................................................64 Damage to Property or the Environment .........................................................................66 Dangerous Acts................................................................................................................68 Drugs................................................................................................................................70 Fraud ................................................................................................................................72 Kidnapping.......................................................................................................................74 Public Order.....................................................................................................................76 Robbery............................................................................................................................78 Weapons...........................................................................................................................80 Offences against the State, Justice or Organised Crime ..................................................82 Crime Atlas Comments....................................................................................................84 Issues with Data and Analysis ..........................................................................................85 Conclusion..........................................................................................................................87 Bibliography.......................................................................................................................89 Appendix 1: R Code Used Throughout the Thesis...............................................................92 Appendix 2: Table of Census Themes..................................................................................99 Appendix 3: Garda Sub-district Look-Up Tables..............................................................105
  • 5. iv List of Figures Figure 1: Portion of Charles Booth’s classification map of London..................................................5 Figure 2: Cumulative variance explained by components 1:40 .......................................................20 Figure 3: Scree plot of (WCSS) for k = [1:100]...............................................................................23 Figure 4: Heatmap of clusters where k=14.......................................................................................23 Figure 5: Rader plot of Cluster One.................................................................................................29 Figure 6: Rader plot of Cluster Two ................................................................................................31 Figure 7: Rader plot of Cluster Three ..............................................................................................33 Figure 8: Rader plot of Cluster Four ................................................................................................35 Figure 9: Rader plot of Cluster Five.................................................................................................37 Figure 10: Rader plot of Cluster Six ................................................................................................39 Figure 11: Rader plot of Cluster Seven............................................................................................41 Figure 12: Radar plot of Cluster Eight.............................................................................................43 Figure 13: Radar plot of Cluster Nine..............................................................................................45 Figure 14: Radar plot of Cluster Ten ...............................................................................................47 Figure 15: Radar plot of Cluster Eleven...........................................................................................49 Figure 16: Radar plot of Cluster Twelve..........................................................................................51 Figure 17: Radar plot of Cluster Thirteen ........................................................................................53 Figure 18: Radar plot of Cluster Fourteen .......................................................................................55 Figure 19: Box plots by Cluster of theft related crime.....................................................................61 Figure 20: Box plots by Cluster of assault related crime .................................................................63 Figure 21: Box plots by Cluster of burglary related crime...............................................................65 Figure 22: Box plots by Cluster of damage related crime................................................................67 Figure 23: Box plots by Cluster of crimes relating to dangerous acts..............................................69 Figure 24: Box plots by Cluster of drug related crime.....................................................................71 Figure 25: Box plots by Cluster of fraud related crime....................................................................73 Figure 26: Box plots by Cluster of kidnapping related crime ..........................................................75 Figure 27: Box plots by Cluster of public order and social code crime...........................................77 Figure 28: Box plots by Cluster of robbery related crime................................................................79 Figure 29: Box plots by Cluster of weapons related crime ..............................................................81 Figure 30: Box plots by Cluster of offences against the State, justice or organised crime ..............83 Figure 31: Variance in crime rates explained by the clustering classification.................................84
  • 6. v List of Maps Map 1: Garda Sub-districts by division............................................................................................14 Map 2: Garda Sub-districts by cluster..............................................................................................26 Map 3: Cluster One..........................................................................................................................28 Map 4: Cluster Two .........................................................................................................................30 Map 5: Cluster Three .......................................................................................................................32 Map 6: Cluster Four .........................................................................................................................34 Map 7: Cluster Five..........................................................................................................................36 Map 8: Cluster Six ...........................................................................................................................38 Map 9: Cluster Seven.......................................................................................................................40 Map 10: Cluster Eight ......................................................................................................................42 Map 11: Cluster Nine.......................................................................................................................44 Map 12: Cluster Ten.........................................................................................................................46 Map 13: Cluster Eleven....................................................................................................................48 Map 14: Cluster Twelve...................................................................................................................50 Map 15: Cluster Thirteen .................................................................................................................52 Map 16: Cluster Fourteen.................................................................................................................54 Map 17: The Urban Rural Divide ....................................................................................................56 Map 18: Theft...................................................................................................................................60 Map 19: Assault ...............................................................................................................................62 Map 20: Burglary.............................................................................................................................64 Map 21: Damage to property or the environment ............................................................................66 Map 22: Dangerous Acts..................................................................................................................68 Map 23: Drugs..................................................................................................................................70 Map 24: Fraud..................................................................................................................................72 Map 25: Kidnapping ........................................................................................................................74 Map 26: Public Order.......................................................................................................................76 Map 27: Robbery..............................................................................................................................78 Map 28: Weapons ............................................................................................................................80 Map 29: State, Justice or Organised Crime......................................................................................82
  • 7. vi List of Tables Table 1: Derived variables used in the classification.......................................................................18 Table 2: Extreme z scores for variables within clusters ...................................................................25 Table 3: Sub-districts in Cluster One ...............................................................................................29 Table 4: Sub-districts in Cluster Two ..............................................................................................31 Table 5: Sub-districts in Cluster Three ............................................................................................33 Table 6: Sub-districts in Cluster Four ..............................................................................................35 Table 7: Sub-districts in Cluster Five...............................................................................................37 Table 8: Sub-districts in Cluster Six ................................................................................................39 Table 9: Sub-districts in Cluster Seven............................................................................................41 Table 10: Sub-districts in Cluster Eight ...........................................................................................43 Table 11: Sub-districts in Cluster Nine............................................................................................45 Table 12: Sub-districts in Cluster Ten..............................................................................................47 Table 13: Sub-districts in Cluster Eleven.........................................................................................49 Table 14: Sub-districts in Cluster Twelve........................................................................................51 Table 15: Sub-districts in Cluster Thirteen ......................................................................................53 Table 16: Sub-districts in Cluster Fourteen......................................................................................55
  • 8. 1 Introduction This thesis has two main parts; part one will focus on the statistical methods used in geodemographics to classify Garda Sub-districts. The thesis will use principal components analysis and clustering techniques to create the classification. Part two will then consist of a crime atlas of Ireland at Garda Sub-district level; this atlas will also contain information regarding the performance of the clustering exercise when used with real crime data. The census of Ireland provides a vast amount of data at many geographic levels that are disseminated by the Central Statistics Office (CSO). For example the smallest geography that the census is reported at is the Small Area (Central Statistics Office, 2014). There are 18,488 Small Areas in the Republic of Ireland. The census reports 764 variables for each of the 18,488 Small Areas, this gives fourteen million, one hundred and twenty four thousand, eight hundred and thirty two (14,124,832) individual data points. This thesis concentrates on the 563 catchment areas of Garda stations as defined by An Garda Síochána and reported in the census as Garda Sub Divisions or Garda Sub- districts (Central Statistics Office, 2014). Even though this gives 563 areas to work with, that still equates to 430,132 data points, not including crime statistics for the 13 years available. Making sense of that amount of data requires them to be summarised so that the end user is not overwhelmed. Traditionally crime statistics are reported by county or region of the country as can be seen in newspapers and websites when reporting on crime; examples include The Independent (MacCarthaigh & Phelan, 2014) and the Irish Mirror (Jordan, 2015). While it makes intuitive sense to amalgamate Garda stations by county for the purposes of crime
  • 9. 2 statistics reporting, nuances and spatial variation within counties are lost. Another approach could be to report crimes by station; however this may arguably be problematic. While reporting by station would show differences within counties and nationally, without knowing something of the characteristics of the individual stations, comparing them becomes moot. The first law of geography that ‘everything is related to everything else, but near things are more related than distant things’ (Tobler, 1970) is generally accepted to hold. However, just because two Garda station areas are next to each other doesn’t automatically mean that the characteristics of the areas are the same. There is little point comparing Ballina in County Mayo with Killala for example. The catchments are neighbours, but Ballina is a generally urban environment with 14,329 people at the last census in 2011, whereas Killala is a rural, coastal area that is physically larger than the Ballina catchment but with only 3,766 people (Central Statistics Office, 2014). This thesis advocates comparing stations not only on their location in An Garda Síochána hierarchy, but rather based on their similarities, specifically the underlying geodemographics. Perhaps Killala has more in common with Kilrush in Clare or Duncannon in Wexford and comparing it to its peers is a more sensible approach than comparing it to its neighbours. That being said, it is expected that the classification carried out in this work will show clusters of clusters throughout Ireland in support of Tobler’s Law. There is much literature on the area of geodemographic classification, particularly for marketing in the United Kingdom and America. Brunsdon et al. (2014) have created a geodemographic classification of Ireland at the Small Area scale. Gale et al. (2015) have used geodemographics in London relating to crime; however there does not appear to be a classification at the national level for Ireland relating to Garda station catchments. It is
  • 10. 3 felt that this thesis may address a gap in the literature and provide a new tool for comparing policing in Ireland. The first aim is to create the classification. The steps will be described in detail throughout this thesis; however, in short, census data will be used to create the classification. Relevant variables will be chosen and transformed for use. Methods of reducing the data to manageable proportions will be used and then the data will be subjected to a clustering algorithm. The resulting clusters of similar Garda Sub-districts will then be named and described. The second part of the thesis will then use these clusters to map various crime statistics at a national and cluster by cluster basis to allow comparison based on similar Garda stations to be made. It is hoped that in creating this classification, policy makers may begin to ask questions such as: if this area A has similar characteristics to area B, why are the crime rates so different? Are more resources needed to reflect the similar catchment demographics? Are the opening hours of a station appropriate given the social demographic makeup of the population? It is further hoped that the answers to these questions may be found by those with access to more information, such as Garda numbers and skill breakdowns, policing and social services infrastructure in areas etc. Lastly it is anticipated that this thesis can be built upon by carrying out the same classification with up to date data when the census is completed and made available after census 2016.
  • 11. 4 Literature Review General Classification is a natural human process that helps us understand and make sense of the world around us. Parker et al. (2007) assert that the natural process of classification undertaken by lay people on a daily basis is fundamental to the ‘sociospatial construction of reality’. Perec (1989) quoted in Boyne (2006) speaks of a dream of universal classification, or law, to describe the whole world that did not, and could not work. So it was imagined that the entire world could be distributed according to a unique code, that one universal law would reign over the totality of phenomena: two hemispheres, five continents, masculine and feminine, animal and vegetable, singular plural, right left, four seasons, five senses, six vowels, seven days, twelve months, twenty-six letters. (Perec, 1989:155) Vickers and Rees (2011) state that complex systems can be classified to help the understanding of those systems. However there is no one right system of classification, Dupré (2006) argues that classifications will be driven by the purpose for which they were created and that different classifications will be ‘good or bad for particular purposes’. This is a view that is shared by Charlton et al. who produced one of the first open classifications of Britain from the 1981 British Census. They state that ‘...the “meaningfulness” of any classification is an arbitrary thing...’ (Charlton, et al., 1985). Area classification is the act of grouping areas based on selected features with those areas, the similarity of the characteristics of the selected features drives the classification (Vickers & Rees, 2007). Geodemographic classifications are a type of area classification that Vickers and Rees (2007) describe as ‘one of the most commonly used areas classifications’. One of the things that can make geodemographic classifications useful is the descriptions that generally accompany them to give a textual summary of the attributes of each class (Abbas, et al., 2009). Geodemographics are widely used in
  • 12. 5 marketing where it is described as a ‘popular segmentation technique’ (Doyle, 2011). Many (Abbas, et al., 2009; Gale, et al., 2015; Singleton & Longley, 2008; Vickers & Rees, 2007) describe geodemographic classifications as tools to summarise large sets of spatially dependent data such as census data. Gale et al. (2015) state that geodemographic classifications allow the highlighting of similarities between population structures in different parts of a country. Gale et al. also point out that geodemographic classifications give summaries based not only on the population but also on the built environment. It would be remiss to discuss geodemographic classification without mentioning Charles Booth. Between 1886 and 1903 Booth and several assistants accompanied police officers on the beat around London to investigate places of work, working conditions, homes and the urban environments. Through interviews and observations Booth created a several volume piece of work, ‘Inquiry into the Life and Labour of the People in London’ (London School of Economics and Political Science , 2012). Booth’s work (Booth, 1903) was one of the first attempts to map [and classify] social-spatial structures (Alexiou & Singleton, 2015). A portion of the digitised maps along with the classification Booth used is shown below in Figure 1. Figure 1: Portion of Charles Booth’s classification map of London Source: London School of Economics and Political Science (2012)
  • 13. 6 Theory While geodemographics have a long history of being used in one form or another, it must be acknowledged that the theory driving geodemographics is less robust than the results. The underpinning theory behind geodemographics is Tobler’s first law of geography that ‘everything is related to everything else, but near things are more related than distant things’ (Tobler, 1970). Singleton and Longley (2009) note that the theoretical frameworks remain ‘rather weak’; this view is reiterated by Alexiou and Singleton (2015) who state that classifications based on geodemographics lack solid theory. Another issue with many geodemographic classifications is that they are not generally geographically weighted, and due to the methods of their construction are aspatial in design despite showing spatial correlations in the results (Alexiou & Singleton, 2015). While the issues with theoretical grounding are acknowledged, Singleton and Longley express their hopes for best practice geodemographics in that they are: focused, recognise the providence of the data used, are scientifically reproducible and use the best methods available (Singleton & Longley, 2009). Modifiable Areal Unit Problem Geodemographic analysis is generally agreed to be best carried out at the smallest areal unit available in order not to lose spatial variation that larger units may obscure (Alexiou & Singleton (2015), Charlton, et al. (1985) and Gale, et al. (2015) are examples). However the scale also depends on the purpose of the classification (Alexiou & Singleton, 2015), therefore a balance must be struck. Another factor to consider is the Modifiable Areal Unit Problem (MAUP). Gehlke and Biehl (1934) noted that choices in data aggregation over space and the size of areal unit used in analysis have influence over the correlation coefficient. This was expanded upon by Openshaw and Taylor (1979) who
  • 14. 7 coined the phrase Modifiable Areal Unit Problem. They conducted experiments on a spatial data set and found that they could obtain correlations of between -.99 and .99 from different levels of aggregation. Charlton and Brunsdon (2016) presented a paper at the GIS Research UK Conference which revisited the work by Gehlke and Biehl using census data from Ireland at several different official aggregation levels ranging from Small Areas to Counties. They were able to show that the larger areas lost variance between areas and increased correlations (as total variance doesn’t change). While the MAUP is a factor, there is not much that can be done about it if, for example, data is only available at one areal unit scale. If using official areal units one must also be aware of the MAUP when comparing results over time, as these boundaries may change. An example of this is the Irish Electoral Constituencies which were changed by an act of the Irish Parliament (Houses of the Oireachtas, 2013) as required by Article 16.4 of the Irish Constitution every twelve years (Government Publications, 2016). Likewise and more relevant for this work are changes to Garda Sub-districts, the boundaries were changed in 2013 following the closure of some 100 Garda stations (An Garda Siochana, 2013; Central Statistics Office, 2014). This is an issue noted in the United Kingdom in relation to British police Basic Command Units (BCU), where Ashby and Longley (2005) state ‘Maintaining the BCU families is an arduous task due to the temporal instability of these administrative units’. Therefore any follow up to this thesis should be aware of the possibility the MAUP affecting results should the boundaries be changed by An Garda Síochána.
  • 15. 8 Variables Selection Harris et al. (2005) explain that a geodemographic classification is created by grouping areas that are alike in to a number of classes, often based on census data. As geodemographic classifications are a way of summarising social, demographic and built characteristics of zoned geography (Gale, et al., 2015), it makes sense to use census data. Vickers and Rees (2007) also argue that a national census (in their case British, but the principal holds) stands above other sources due to its amount of data and ‘comprehensive geographic coverage’. The choices of variables to use in the classification drive the results, Charlton et al. (1985) describe the choosing of variables as ‘absolutely crucial’. However, they do state that the choices are very difficult to make. Vickers and Rees (2007) suggest that variables should be chosen only if there is a good reason; this implies that including a variable just because one has the data is not the best policy. Correlation Variable correlation can be an issue when using census data to inform analysis. Collinearity in the data can affect the performance of any significance tests that may be carried out on, for example, linear regressions (Anderson, et al., 2010). In classification exercises correlation between variables creates redundancy in the input data (Alexiou & Singleton, 2015). Two main methods exist in dealing with correlation within the variables and it is acknowledged that there is no general rule (Vickers & Rees, 2007). One approach adopted is to remove one of the pairs of highly correlated variables from use (Alexiou & Singleton, 2015; Vickers & Rees, 2007). Another approach is to use Principal Components Analysis (PCA) to transform a set of N correlated variables into a set of n uncorrelated principal components. This approach was used by Charlton et al. (1985) and
  • 16. 9 Brunsdon et al. (2014). With principal components, each component is a linear combination of the parent variables so the variance is retained but the components are uncorrelated (Alexiou & Singleton, 2015). Additionally, the first component accounts for the most variance and each component adds less to the overall variance explained (Jolliffe, 2002). The user can therefore decide how much variance they are willing to sacrifice in order to reduce dimensionality in the data by using fewer principal components than the number of input variables (Charlton, et al., 1985; Harris, et al., 2005; Jolliffe, 2002). Units Variables used in geodemographics can be reported at different units such as percentages of population, count data, indices etc. (Alexiou & Singleton, 2015). This can make comparison of variables difficult. The census of Ireland (Central Statistics Office, 2014) reports most available data as a count of people; therefore it is relatively simple to convert any required variable to percentage of population within the spatial unit. This not only makes the variables easier to compare, it also stops areas with high population figures affecting the analysis due to higher absolute numbers. The variables may also be standardised using z scores to allow for true comparison of the individual variables influence on a cluster. The z score is a measure of the relative location in a data set of the observation, therefore data points in two different data sets with the same z score have the same relative location, i.e. they are the same number of standard deviations from the mean (Anderson, et al., 2010).
  • 17. 10 Clustering Methods Clustering involves finding subsets of interest within a larger set, the subsets are called clusters and are usually homogeneous within each cluster and separated between clusters (Hansen & Jaumard, 1997). Gordon (1987) notes that it is ‘difficult to make an informed choice of relevant clustering strategies’, Vickers and Rees (2007) maintain that there is no right or wrong way to classify. Commercial classifications tend to build from the ground up, clustering at the smallest available level then aggregating in to larger groups (Singleton & Longley, 2008). The open Output Area Classification in the UK, however, was clustered from the top down by creating several large clusters that were then subjected to clustering techniques separately (Vickers & Rees, 2007). It is widely acknowledged among the available literature that k-means clustering is the technique of choice for geodemographic clustering. This is shown in either the acknowledgement of k- means in theoretical papers or the used of k-means in applied papers (Abbas, et al., 2009; Alexiou & Singleton, 2015; Brunsdon, et al., 2014; Charlton, et al., 1985; Vickers & Rees, 2007). K-means clustering is seen to have something of an advantage over other methods such as agglomerative, divisive, constructive or direct optimisation (described well in Gordon (1987)). This is because they are all hierarchical in nature and will force a hierarchy on the output even if one does not exist (Gordon, 1987). K-means will not force a hierarchy on the output, however it has the disadvantage that it requires to ‘know’ the number of clusters beforehand (Singleton & Longley, 2008). Clustering techniques generally require a measure of dissimilarity between observations (Jolliffe, 2002). K- means uses the squared Euclidean distance (Alexiou & Singleton, 2015). In essence k- means uses k clusters to sort n observations while minimising the sum of squared errors
  • 18. 11 (Alexiou & Singleton, 2015; Ding & He, 2004). K-means assigns each observation to a cluster while minimising sum of squares, a new set of means is then calculated and the process begins again. The process only stops when the within cluster sum of squares (WCSS) is minimised. This occurs when cluster assignments no longer change as any changes would not make the sum of squares smaller (Alexiou & Singleton, 2015). Cluster Description Once the WCSS is minimised and the clusters are assigned, the results need to be described. The aim of cluster descriptions is to provide a short profile of each cluster for the end user. Vickers and Rees (2007) explain that profiles use text and visuals to help the end users’ understanding of the cluster group in a few sentences. The cluster labelling and description process is acknowledged by Vickers and Rees (2007) to be difficult and subject to much thought, in order not to mislead the user or offend the people living in the areas classified. Cluster descriptions draw on the main identifiable (Debenham, 2002), dominant (Abbas, et al., 2009) characteristics of a cluster. Often, the process involves using z scores to identify extreme variables within the cluster compared to the global mean (Debenham, 2002; Vickers & Rees, 2007). The descriptions that are attached to geodemographic classifications are viewed as useful to other researchers (Abbas, et al., 2009). Parker et al. (2007) go as far as to suggest that the classification descriptors may be the ‘most sociologically interesting’ element of the geodemographic classification process. Therefore the naming and description element of geodemographic classification should not be overlooked or given less attention than the more statistical elements of the process.
  • 19. 12 Data Three data sets are used throughout this work to create the classification: 1. Census 2011, all 764 reported census variables in columns, at Garda Sub-district level in 563 rows (Central Statistics Office, 2014). 2. Garda Sub-district boundary files for use in mapping the outcomes (Central Statistics Office, 2014a). 3. Crime data for Ireland at Garda Sub-district level (Central Statistics Office, 2016). The crime data is agglomerated at the Garda Sub-district level to 12 crime types.  Attempts/threats to murder, assaults, harassments and related offences.  Dangerous or negligent acts.  Kidnapping and related offences.  Robbery, extortion and hijacking offences.  Burglary and related offences.  Theft and related offences.  Fraud, deception and related offences.  Controlled drug offences.  Weapons related offences.  Damage to property and to the environment.  Public order and other social code offences.  Offences against government, justice procedures and organisation of crime.
  • 20. 13 The data for this study fall in to two main areas: socio-demographic data from the Census of Ireland (Central Statistics Office, 2014), and crime data (Central Statistics Office, 2016). Both sets of data are reported at the Garda Sub-district level. There are 563 Garda Sub-districts in Ireland (Central Statistics Office, 2016). The Sub-districts range in size from ≈1.9km² in Fitzgibbons Street in Dublin, to ≈658km² in Louisburgh in Mayo. They have populations ranging from 384 in Sraith Salach in Galway to 98,078 in Blanchardstown in Dublin (Central Statistics Office, 2014). These Sub-districts are based loosely on the official geography of Irish Townlands, but were designed by the Examiner of Maps (GIS) at An Garda Síochána to suit the organisation’s needs (Creaner, 2016). The Sub-districts are a unique data set in that they are designed for operational rather than statistical reasons. It is acknowledged by Creaner that using Small Areas would be better statistically. However the use of Small Areas is unlikely as it doesn’t make operational sense for the national police force to end a catchment area in the middle of a motorway (for example), as may happen with Small Areas (Creaner, 2016). The Garda Sub-districts are shown, grouped in to administrative divisions in Map 11 . It is not known exactly how the 2011 census data were attached to the 2013 Garda Sub-district geography. The assumption in this thesis is that the CSO would be able to populate the new boundaries at the household level. At the time of writing the CSO have not replied to my queries, however the designer of the Sub-districts (Creaner, 2016) has agreed that this assumption is a reasonable one. Another option would be to populate the Sub-districts by centroid based on a smaller unit such as Small Areas, if this is the case it is not felt that there would be too much loss of overall validity due to the number of Small 1 All maps produced in this thesis use boundary files from the CSO website and contain Ordnance Survey Ireland Data (Ordnance Survey Ireland, 2012).
  • 21. 14 Areas (18,488) being assigned to one of 563 Sub-districts. Aside from this ambiguity, there is no known issue with Irish Census Data. Map 1: Garda Sub-districts by division The census data used in this thesis is made up of 764 variables that are derived from household level data and amalgamated to the various levels reported from 18,488
  • 22. 15 Small Areas to four Provinces (Central Statistics Office, 2014). The MAUP discussed in the literature review section is relevant, as carrying out the classification at different scale will produce different results. It may be possible to carry out the classification at Small Area level and then combine these results in to larger unit scales. However this is not appropriate for this study. Firstly, there are no smaller units that fit in to Sub-districts due to the proprietary nature of the Sub-districts. Secondly all data required are available at the Sub-district level. Lastly, Sub-districts are the smallest geographical unit that crime data are released in Ireland (Central Statistics Office, 2016). Therefore it is acknowledged that variation within the Garda Sub-districts is lost during this study. However because this classification is for the purpose of being able to compare Garda Stations on a like for like basis, it is felt that the loss is acceptable at a national level.
  • 23. 16 Building the Classification The classification was built in R, a free, open source statistical computing environment that can handle large amounts of data (The R Foundation, 2015). Some code blocks will be included where needed for clarification. However in the interest of reproducibility the full R code that created the classification is included in Appendix 1. Variables This classification aims to give reproducible results that are comparable to similar studies at different scales. Charlton et al. (1985) chose their variables to give a comparable classification between their open one and the commercial ACORN classification in the UK. Brunsdon et al. (2014) chose Irish Census variables at the Small Area Scale to reflect the OAC classification variables chosen by Vickers and Rees (2007). Therefore this study will use the same variables as Brunsdon et al. The full list of variable codes reported by the Census is available for download from the CSO website (Central Statistics Office, 2014). An adapted list is included in Appendix 2 for reference should the reader require clarification on any variable codes used. The variables chosen for the classification exercise are actually derived variables. Each variable is made up of two or more individual census variables to derive variables that are percentages of the population of the area in question. For example one variable used is that of lone parents. This variable is derived by adding the Lone Mothers with Children (number of families) to Lone Fathers with Children (number of families), dividing by the total number of families and multiplying the result by 100. The actual code is shown below. loneParent <- 100*(T4_3FTLF + T4_3FTLM) / T4_5TF
  • 24. 17 In all there are 40 derived variables used in the classification grouped in to six areas: demographic, household composition, housing, socioeconomic, employment and connectivity. The derived variables are shown in Table 1; the actual make up of each derived variable can be seen in the R code in Appendix 1. As stated, the variables from the CSO were reported at Garda Sub-district level. This aided in mapping as both the census file and the boundary file contained unique geography ID numbers (GEOGID) for each Sub-district. The numbers were slightly different in that one set was prefixed with an ‘N’, and the other set was prefixed with an ‘M’. This was fixed by carrying out a find and replace command in Excel before the census file was loaded in R, however it could have just as easily been carried out in R.
  • 25. 18 Theme DerivedVariable Description Demographics Age0-4 Percentage of population aged 0-4 Age5-14 Percentage of population aged 5-14 Age25-44 Percentage of population aged 25-44 Age 45-64 Percentage of population aged 45-64 Age65+ Percentage of population aged 65and over EUNat Percentage of population that is European by nationality (excluding Irish) RestofWorld Percentage of population where nationality was given as Rest of the World BornOutsideIRE Percentage of population not born in Ireland Housing Composition Separated Percentage of persons separated or divorced SinglePerson Percentage of persons (non pensioners) living in one person households Pensioner Percentage of persons who are pensioners LoneParent Percentage of families that are lone parent families NoChildren Percentage of families that are 'pre family' (no children born) NonDependChildren Percentage of families with children where the youngest child is 20+ Housing RentPublic Percentage of totalhouseholds rented fromlocalauthority RentPrivate Percentage of totalhouseholds privately rented Flats Percentage of totalhouseholds defined as flats NoCentralHeat Percentage of totalhousehold with no centralheating RoomsHH Average number of rooms per household PeoplePerRoom Totalpersons ÷ totalrooms SepticTank Percentage of totalhouseholds with an individualseptic tank Socioeconomic HEQual Percentage of persons with an Ordinary Bachelors Degree or higher Employed Percentage of persons at work TwoCars Percentage of households with two or more cars JTWPublic Percentage of persons over age 5who travelto school, college or workby means of bus or rail HomeWork Percentage of persons self employed (Own account workers) LLTI Percentage of persons reporting bad or very bad health UnpaidCare Percentage of persons providing unpaid care Employment Students Percentage of persons who are students Unemployed Percentage of persons who are unemployed having lost or given up jobs EconinactFam Percentage of persons looking after home/family - homemakers Agric Percentage of workers who workin agriculture, forestry or fishing Construction Percentage of workers who workin construction Manufacturing Percentage of workers who workin manufacturing Commerce Percentage of workers who workin commerce and trade Transport Percentage of workers who workin transport and communication Public Percentage of workers who workin public administration Professional Percentage of workers who workin professionalservices Connectivity Broadband Percentage of internet connected households with broadband Internet Percentage of totalhouseholds with some kind of internet access Table 1: Derived variables used in the classification
  • 26. 19 It can be seen in Table 1 that there will be issues in the data using the derived variables as they are. For example Separated can be expected to be correlated with LoneParent. As mentioned previously, Principal Components Analysis (PCA) is a set of methods that can take in variables that may be correlated and produce a set of uncorrelated principal components. PCA is also used to reduce the size of a clustering computational problem (Jackson, 1991). Principal Components Analysis Once the 40 variables were chosen and derived from the census variables they were subjected to Principal Components Analysis. The reasons were twofold; firstly to reduce the dimensionality of the data from 40 to a more manageable number. Secondly, PCA was used to remove any correlation in the data. The cluster algorithm used for the classification was k means, this assumes no correlation. Therefore PCA is essential to provide k means with a set of uncorrelated variables to carry out the clustering. The components are linear combinations of the original variables. Each one contains a proportion of the variance in the original data and they are ordered by the amount of variance they explain. Therefore it is possible to view the cumulative variance explained by the components and decide how many to use in the k means clustering. As mentioned previously, there is a lack of theory in this regard, however the majority of the variance should be kept otherwise the analysis looses too much information to make the PCA worth doing. Jolliffe (2002) describes the choice of a cut off of variance as an ad hoc rule-of-thumb that works in practice. Jolliffe suggests a range from 70% to 90% to retain m components where m is the smallest integer for which the cumulative variance explained is greater than the cut off.
  • 27. 20 There is a function in R that calculates the Principal Components for a user, called princomp(). This function takes in the relevant variables and performs a PCA. It is then possible to view the cumulative variance explained by each component to choose how many to use in the clustering process. For detailed explanations of Principal Components, works by Jackson (1991), Jolliffe (2002), or Rencher and Christensen (2012) are recommended. However the mains steps involved are: 1. Get data – In this case 40 derived variables * 563 Garda Sub-districts 2. Subtract mean of each variable from each instance of the variable 3. Calculate correlation matrix 4. Calculate eigenvalues and eigenvectors of the correlation matrix The Principal Components were calculated and their cumulative explanation of the variance of the original derived variables displayed by entering the following two lines of code in to R. pca<- princomp(gardaVars[,-1],cor=T,scores=T) cumsum(pca$sdev^2/sum(pcs$sdev^2)) The cumulative variance explained by the components is shown in Figure 2. Figure 2: Cumulative variance explained by components 1:40
  • 28. 21 As per general recommendations mentioned earlier, this study will use the first m components that total to at least 80% of the variance. This means that the first nine components will be used in the study, as they account for 80.65% of the variance. This was seen as a good cut off as the other 31 components only accounted for 19.35% of the variance in the original data set between them. Another reason for not including the tenth component is that it is the first component that fails a test that suggests each component should contribute more than 100 𝑝 of the cumulative variance (Jolliffe, 2002). As p is 40 and 100 40 = 2.5, the fact that component ten only explains an extra 1.64% of the variance excludes its use in the classification process. Clustering The work up to this point has concentrated on getting the data ready for the clustering process. The nine principal components represent a much smaller data set than the 40 derived variables, they are also not correlated. This means that they are ready for use in a clustering algorithm. The method of clustering used in this thesis is the k means technique as described in the literature review. The k means method was chosen as it is the system of choice in most geodemographic classifications (Abbas, et al., 2009; Alexiou & Singleton, 2015; Brunsdon, et al., 2014; Charlton, et al., 1985; Vickers & Rees, 2007). K means requires the number of clusters to be known before it sets to minimise the within cluster sum of squares. However this is not an issue in R. It is possible to run k means through a loop in R, in this way it is possible to run the clustering exercise many times with different possible numbers of clusters and for the user to pick the best number for k based on the results. In order to pick the best number for k, the k means process was run 100 times with k starting at one and being increased by one each time. The results
  • 29. 22 were then plotted on a scree plot shown in Figure 3. The code to run through this loop is shown below. nPC<- 9 set.seed(290879) smallest.clus<- wss<- rep(0,100) for(i in 1:100){ clus <- kmeans(pca$scores[, 1:nPC], i,) wss[i]<- clus$tot.withinss smallest.clus[i]<-min(clus$size) } plot(1:100, wss[1:100], type ="h",main= "Cluster Scree Plot",xlab= "Number of Clusters", ylab="Within Cluster Sum of Squares") The scree plot in Figure 3 shows that there is a small step at 14 clusters, at this stage central city Dublin and Cork had split in to a cluster early on. The differences between clusters also started to get smaller. A ‘heatmap’ chart with attached dendrogram was created in R to visualise the clusters using k = 14 and the clustering was deemed satisfactory. As can be seen in the Figure 4 heatmap, clusters 1 and 14 are fairly similar, as are clusters 2 and 8. Therefore it was felt that adding more clusters (k > 14) would not add to the classification. In addition splitting the clusters in to smaller units may have created clusters that were too nuanced for the purpose of comparing Garda Sub-districts. The final step in the clustering process was to join the cluster numbers to the GEOGID numbers of individual Garda stations so that the results could be mapped. It is possible to cluster a set of clusters in order to create higher order super groups for description purposes. However with only 14 clusters and 563 areas it was felt that a second level of clustering was unnecessary for this classification.
  • 30. 23 Figure 3: Scree plot of (WCSS) for k = [1:100] Figure 4: Heatmap of clusters where k=14 K=14
  • 31. 24 Cluster Description and Naming With the clusters set, they needed to be named and described. For some (Abbas, et al., 2009; Parker, et al., 2007) the descriptions attached to clustering exercises such as this one are the most interesting element of the final geodemographic classification process. Vickers and Rees (2011) state that cluster names may be the primary source of information used when judging a cluster in a classification by the end user. Naming and describing the clusters was a multi stage process. All of the clusters were mapped in order to check that the classification seemed spatially sensible. Then the z scores of the original derived variables were calculated for each cluster, the mean z score across all clusters was also calculated for each variable (µ), as was the standard deviation (σ). Clusters that had variables where µ-σ > z were deemed to have an extremely low z score for that variable. Clusters where z > µ+σ were deemed to have an extremely high z score for that variable. These extreme highs or lows accounted for 25% of the variables. The extreme values informed the name attached to the clusters as they were deemed to be dominant and identifiable characteristics of the cluster in question (Abbas, et al., 2009; Debenham, 2002; Vickers & Rees, 2007) . A table of the extremes identified is reproduced in Table 2. In addition to the extreme z scores, the scores that were above or below average for that variable informed more detail in the descriptions where necessary. For the assistance of the end user a radar plot for each cluster is included in the cluster descriptions. This plot shows the global average z score for each variable in red and the z score for the cluster in blue. It also shows the extreme value cut offs in green and purple.
  • 32. 25 Table 2: Extreme z scores for variables within clusters
  • 33. 26 Results The geodemographic classification exercise resulted in the 563 An Garda Síochána Sub-districts being clustered in to 14 groups based on their underlying geodemographics. To assist the reader, each map has an accompanying cartogram. These cartograms are all distorted based on the area population using the Newman and Gastner method contained in an ArcScript add-on for ArcMap, available from Esri (2009). Map 2 shows the overall classification. The accompanying cartogram allows the user to compare areas based on population size. Map 2: Garda Sub-districts by cluster Displaying all the clusters at once has limited value, except to get an overall picture of the cluster distributions. Therefore each cluster will be discussed in turn. In each description the cluster will be mapped, represented as a cartogram and a radar plot of the variables included. The individual station names and divisions in each cluster will also
  • 34. 27 be reported. Should the reader wish to see what cluster an individual station is in, a look up table is attached in Appendix 3. This look up table is sorted alphabetically by Division, Sub-division, and then Sub-district.
  • 35. 28 Cluster One: Comfortable home owning young families in semi rural areas. This cluster consists of 55 Sub-districts serving 167,880 mainly Irish families living in houses, generally working in manufacturing, construction or agriculture. Car ownership is high with very high instances of two or more car households. There are low instances of lone parent families or local authority renters. Houses in this cluster tend to be larger than average and serviced by septic tanks. Map 3: Cluster One Extreme High Values: TwoCars, Manufacturing. Extreme Low Values: Separated, LoneParent, RentPublic, LLTI.
  • 36. 29 Figure 5: Rader plot of Cluster One Table 3: Sub-districts in Cluster One
  • 37. 30 Cluster Two: Young, mobile, affluent, multicultural singles This cluster consists of 11 Sub-districts serving 243,625 people. The areas are characterised by the number of 25-44 year olds and the lack of young children and older people. The areas are city centre in nature with large numbers of rented flats, and low car ownership. People who can afford to live here are well educated and work in commerce or transporting other workers. The population is multicultural with high levels of persons from around the EU and wider world. Map 4: Cluster Two Extreme High Values: Age25-44, EUNat, RestofWorld, BornOutsideIRE, SinglePerson, Separated, LoneParent, NoChildren, RentPublic, RentPrivate, Flats, PeoplePerRoom, HEQual, Employed, JTWPublic, Students, Commerce, Transport, Broadband. Extreme Low Values: Age0-4, Age5-14, Age45-64, Age65+, Pensioner, RoomsHH, SepticTank, TwoCars, HomeWork, UnpaidCare, EconinactFam, Agric, Construction, Manufacturing.
  • 38. 31 Figure 6: Rader plot of Cluster Two Table 4: Sub-districts in Cluster Two
  • 39. 32 Cluster Three: Struggling labouring communities This cluster consists of 68 Sub-districts serving 1,168,789 people. These areas are generally semi urban towns, or towns that have grown around manufacturing bases. There are high levels of unemployment and high levels of local authority renting. These areas may have been hit hard by the recession and abandoned plans for government decentralisation. Map 5: Cluster Three Extreme High Values: Separated, LoneParent, RentPublic, Unemployed, Manufacturing. Extreme Low Values: NonDependChildren, JTWPublic.
  • 40. 33 Figure 7: Rader plot of Cluster Three Table 5: Sub-districts in Cluster Three
  • 41. 34 Cluster Four: Young labouring families in outer commuter areas This cluster consists of 56 Sub-districts serving 361,375 mainly Irish people. These areas may be recently built towns and villages with good connectivity. These areas contrast very low numbers of older people and those with grown up children with high numbers of families with children aged 0-14. Above average levels of car ownership and employment combined with below average agriculture workers indicate that those who do work may commute. Map 6: Cluster Four Extreme High Values: Age0-4, Age5-14. Extreme low Values: Age65+, Pensioner, NonDependChildren.
  • 42. 35 Figure 8: Rader plot of Cluster Four Table 6: Sub-districts in Cluster Four
  • 43. 36 Cluster Five: Settled older rural communities This cluster consists of 15 Sub-districts serving 48,067 people. These areas have an aging population living in older housing stock with extreme levels of housing with no central heating. There is also an above average amount of septic tanks in use. There are low levels of unemployment but high levels of those who are unpaid carers, and above average levels of agricultural working. Map 7: Cluster Five Extreme High Values: Age45-64, Age65+, EUNat, Separated, Pensioner, NoCentralHeat, HomeWork, UnpaidCare. Extreme Low Values: Unemployed, Manufacturing, Public, Professional.
  • 44. 37 Figure 9: Rader plot of Cluster Five Table 7: Sub-districts in Cluster Five
  • 45. 38 Cluster Six: Urban city peripheral communities This cluster consists of 27 Sub-districts serving 730,183 people. It is characterised by low instances of septic tanks and very low car ownership contrasted with high numbers of lone parents and below average employment. The number of students is high due to the proximity to university sites in commutable distances. The instances of pensioners is also high, however the high numbers of students and local authority renters indicates a semi fluid community. Connectivity is good in keeping with the proximity to city centres, as is the use of public transport for work journeys. Map 8: Cluster Six Extreme High Values: Pensioner, LoneParent, RentPublic, JTWPublic, Students, Commerce, Transport, Professional, Broadband. Extreme Low Values: SepticTank, TwoCars, HomeWork, Agric, Construction.
  • 46. 39 Figure 10: Rader plot of Cluster Six Table 8: Sub-districts in Cluster Six
  • 47. 40 Cluster Seven: Struggling rural aging communities This cluster consists of 23 Sub-districts serving 50,648 people. The areas are aging and rural in nature. There are high levels of unemployment and unpaid caring. There are few children and few persons with a higher education. These communities may have suffered from out migration as indicated by the very low levels of 25-44 year olds. Map 9: Cluster Seven Extreme High Values: Age 45-64, Age65+, Pensioner, LoneParent, NonDependChildren, SepticTank, JTWPublic, LLTI, UnpaidCare, Unemployed, Construction, Professional. Extreme Low Values: Age0-4, Age25-44, HEQual, Employed, Manufacturing, Commerce, Internet.
  • 48. 41 Figure 11: Rader plot of Cluster Seven Table 9: Sub-districts in Cluster Seven
  • 49. 42 Cluster Eight: Rural farming communities This cluster consists of 43 Sub-districts serving 118,173 people. The areas tend to have high levels of non dependent children. Perhaps these adult children live on the family farm as the level of workers in agriculture is also extremely high. Connectivity is poor in terms of internet or broadband. Those who do work, tend not to be employed in the public or professional sectors. Instances of private rent and flats are low, indicating a stable community. Map 10: Cluster Eight Extreme High Values: NonDependChildren, Agric. Extreme Low Values: Professional, Internet.
  • 50. 43 Figure 12: Radar plot of Cluster Eight Table 10: Sub-districts in Cluster Eight
  • 51. 44 Cluster Nine: Small rural Townlands This cluster consists of 53 Sub-districts serving 229,825 people. The areas tend to employ people in the public sector; however unemployment in general is high. These areas have below average number of flats and a population that is beginning to age. They are near larger population centres so that what internet there is tends to be broadband, however internet connectivity in general is poor. Map 11: Cluster Nine Extreme High Values: Public. Extreme Low Values: None.
  • 52. 45 Figure 13: Radar plot of Cluster Nine Table 11: Sub-districts in Cluster Nine
  • 53. 46 Cluster Ten: Labouring rural communities in older housing stock This cluster consists of 21 Sub-districts serving 60,960 people. The numbers of people working in agriculture is high, as is construction work. Significant portions of the population are self employed (HomeWork). The housing stock is older in these areas as indicated by the above average level of septic tank use combined with the extremely high levels of housing with no central heating. There are few flats and rented accommodation in these areas indicating a settled rural community. Map 12: Cluster Ten Extreme High Values: NoCentralHeat, HomeWork, Agric, Construction. Extreme Low Values: Public.
  • 54. 47 Figure 14: Radar plot of Cluster Ten Table 12: Sub-districts in Cluster Ten
  • 55. 48 Cluster Eleven: Young educated commuter families This cluster consists of 29 Sub-districts serving 750,014 people. These areas have very high instances of 0-4 year olds and 25-45 year olds contrasting with very low instances of people aged 45 and over. These areas may include significant numbers of new estates indicated by the low levels of non central heated homes, above average number of flats and high instances of internet and broadband. These are commuter areas for a highly educated workforce employed primarily in transport and commerce. Map 13: Cluster Eleven Extreme High Values: Age0-4, Age25-44, PeoplePerRoom, HEQual, Employed, Commerce, Transport, Broadband, Internet. Extreme Low Values: Age45-64, Age65+, Pensioner, NonDependChildren, NoCentralHeat, SepticTank, LLTI, UnpaidCare, Unemployed, Agric.
  • 56. 49 Figure 15: Radar plot of Cluster Eleven Table 13: Sub-districts in Cluster Eleven
  • 57. 50 Cluster Twelve: Comfortable rural farming communities This cluster consists of 40 Sub-districts serving 94,434 people. The areas are very similar to those in Cluster Eight ‘Rural farming communities’ described on page 42. The main difference being that employment in these areas is split between agriculture and the public sector. The fact that these areas have very high levels of public sector employment may account for above average numbers of 5-14 year olds. Internet connectivity is poor and the community in general is a settled Irish one. Above average instances of two or more car households may be an indicator of wealth. Map 14: Cluster Twelve Extreme High Values: NonDependChildren, SepticTank, Agric, Public. Extreme Low Values: EUNat, BornOutsideIRE, Separated, Broadband.
  • 58. 51 Figure 16: Radar plot of Cluster Twelve Table 14: Sub-districts in Cluster Twelve
  • 59. 52 Cluster Thirteen: Affluent professional commuters in larger homes This cluster consists of 38 Sub-districts serving 182,970 people. These areas are populated by healthy professionals. The housing stock is generally larger than average. There are very few local authority renters and below average numbers of private renters. Unemployment is low and there are above average instances of persons with a third level education. The people of these areas are employed in professional services and commute by car. Map 15: Cluster Thirteen Extreme High Values: RoomsHH, TwoCars, Professional. Extreme Low Values: Separated, LoneParent, RentPublic, LLTI, Unemployed.
  • 60. 53 Figure 17: Radar plot of Cluster Thirteen Table 15: Sub-districts in Cluster Thirteen
  • 61. 54 Cluster Fourteen: Semi rural periphery manufacturing communities This cluster consists of 84 Sub-districts serving 381,309 people. This cluster could be described as consisting of those areas that didn’t fit in to any other cluster. Only one variable (manufacturing) in these areas is considered extreme, and that is only just. These areas are close enough to major population areas to be influenced by them, but possibly too far away to be considered commuter areas. There are above average levels of agricultural and construction work. There are also above average instances of unemployment and of staying at home to look after children. Map 16: Cluster Fourteen Extreme High Values: Manufacturing. Extreme Low Values: None.
  • 62. 55 Figure 18: Radar plot of Cluster Fourteen Table 16: Sub-districts in Cluster Fourteen
  • 63. 56 The Urban Rural Divide The clusters as described on the previous pages can be loosely defined as having either rural or urban tendencies. Clusters 1, 5, 7, 8, 10, 12 and 14 can be said to exhibit rural tendencies. Clusters 2, 3, 4, 6, 9, 11 and 13 can be said to exhibit urban tendencies. Map 17: The Urban Rural Divide
  • 64. 57 In order to give these tendencies a visual test, a final map in this section was created. This map visualises urban clusters as blue, and rural clusters as yellow. Settlement data were downloaded (Central Statistics Office, 2014a) and also visualised on the map. The map indicates that the allocation of urban or rural in the description phase was successful. It is however a rough and ready indicator and says nothing about the validity of the allocation of Sub-districts to individual clusters.
  • 65. 58 Crime Atlas Set Up The classification exercise is complete and the resulting clusters have been mapped and described. These clusters will now be used in a basic crime atlas of Ireland. Each of the crime types reported at Sub-district level will be mapped. These maps will be choropleth maps of Ireland showing the number of crimes per 1000 of population along with a map of the change in recorded crime between 2014 and 2015 in absolute numbers. Additionally, box plots of the crime rates within each cluster will be produced; these will show the number of crimes per 1000 of population. It is anticipated that the intra-cluster variance will be small due to the clustering exercise. It is further anticipated that there will be different patterns of inter-cluster variance for each crime type due to the propensity for different crime types to occur in different areas. For each crime type the clusters will be subjected to a simple liner regression, this is to test if the crime rates in each cluster are statistically significantly different from the national rate for that crime. The box plots and regressions were created in R. For each crime type a new variable was created that returned the number of crimes per 1000 people in each Sub- district. The national rate was also calculated and a box plot created for each cluster. For the regression another variable was required representing the number of crimes per 1000 people minus the national figure. This was then subjected to a liner regression with the cluster numbers as dummy variables. As an example, the procedure for kidnapping related crimes is shown on the next page.
  • 66. 59 yLabel<-"Recorded Crimes per 1000 Population" Crimes$Kid <-(Crimes$KID_2015/Crimes$Total2011)*1000 KidTot <- (sum(Crimes$KID_2015)/sum(Crimes$Total2011))*1000 boxplot(Crimes$Kid ~ Crimes$cluster, main="Kidnapping Related Crime 2015",xlab="Cluster", ylab= yLabel) abline(h=KidTot, lty=2, col='red') legend("top", legend=("National Rate"), lty=2,col="red", bty="n") Crimes$KidDev<-Crimes$Kid-KidTot KidMod <- lm(KidDev~as.factor(cluster)+0, data=Crimes) Summary(KidMod) The results for kidnapping related offences show that there is very little variance within clusters. Kidnapping is a rare crime to be recorded. In 2015, only 155 kidnapping related crimes were recorded in Ireland. This gives a crime rate of 0.03 crimes per 1000 people nationally. Two of the clusters were significantly different from this; these differences were driven by outliers in the data. However these outliers should not be ignored as they say something about the crimes. They will not be discussed here however as each crime type will be discussed from the following page. The clusters represented in the box plots will also be coloured based on their tendencies identified in the clustering process to be urban (blue) or rural (yellow) in nature. The simple linear model also allows for the amount of variance in crime rates between Sub-districts that is explained by the cluster to which they belong, to be expressed as a percentage. This is done using the adjusted R squared in the model summary.
  • 68. 61 Figure 19: Box plots by Cluster of theft related crime Nationally the number of theft related crimes per 1000 people in Ireland for 2015 was 16.56. Most of the clusters perform well with low intra cluster variability. Only clusters three (with a p value of 0.32) and six (with a p value of 0.44) do not differ significantly from the national figure, with most being below the national figure. The largest variability can be found in cluster two; this cluster is driving up the national figure with rates in its Sub-districts ranging from 24.46 to 48.43 per 1000 of population. Then there is a jump for Anglesea Street in Cork to 136.54, the two outliers are Store Street and Pearse Street Sub-districts in central Dublin with 280.21 and 296.34 respectively. The other notable outlier is that of the Dublin Airport Sub-district in cluster eleven, with theft related crime rate of 422.6 per 1000 of population. The classification accounts for 27.5% of the national variance in theft related crime at the Sub-district level.
  • 70. 63 Figure 20: Box plots by Cluster of assault related crime Nationally the number of assault related crimes per 1000 of population in Ireland in 2015 was 3.7. The variation with the clusters may seem to be higher for assault rather than theft. However the scale is smaller with a maximum value of 33.27. Again cluster two has the most variability ranging from 2 to 33.27. All clusters can be said to be significantly different from the national figure, except clusters five (p=0.26), six (p=0.94), nine (p=0.52) and eleven (p=0.14). Notable outliers in regards to assault related crime are Pearse Street, Dublin in Cluster two (33.27), Bridewell, Cork in cluster six (12), Castlerea, Longford in cluster nine (21), and Dublin Airport in cluster eleven (22). The classification accounts for 35.6% of the national variance in assault related crime.
  • 72. 65 Figure 21: Box plots by Cluster of burglary related crime Nationally the number of burglary related crimes per 1000 of population in Ireland in 2015 was 5.7. The inter and intra cluster variance are quite large. Perhaps reflecting the sporadic nature of burglaries where areas may be hit with burglaries for a time before criminals move on to the next area. All clusters can be said to be significantly different from the national figure, except clusters three (p=0.46) and eleven (p=0.32). There are notable outliers in regards to burglary related crime. Pearse Street, Dublin in Cluster two has a rate of 26.5 whereas the cluster has a range of 7.09 to 12.56 otherwise. Clondalkin and Courtown Harbour are the outliers for cluster three, Belturbert in the Cavan/Monaghan division is the large outlier in cluster nine, with a rate of 16.08 crimes per 1000 of population. Dundrum and Rathcoole in cluster eleven and Mountrath in cluster fourteen pull up their clusters results. The classification accounts for 54.7% of the national variance in assault related crime.
  • 73. 66 Damage to Property or the Environment Map 21: Damage to property or the environment
  • 74. 67 Figure 22: Box plots by Cluster of damage related crime Nationally the number of damage related crimes per 1000 of population in Ireland in 2015 was 5.68. The intra cluster variation appears to be relatively stable across the classification, with the exceptions of clusters two, three and six. Again cluster two has the most variability ranging from less than 5 in Irishtown and Donnybrook to more than 24 in Pearse Street and Store Street. All clusters can be said to be significantly different from the national figure, except cluster eleven (p=0.27). The large outlier in cluster eleven is Dublin Airport. In 2015 the Sub-district covering the airport recorded 13 damage related crimes, however due to the low population this equates to 31.94 crimes per 1000 of population. The classification accounts for 58.4% of the national variance in damage related crime.
  • 75. 68 Dangerous Acts Map 22: Dangerous Acts
  • 76. 69 Figure 23: Box plots by Cluster of crimes relating to dangerous acts Nationally the number of crime recorded relating to dangerous acts per 1000 of population in Ireland in 2015 was 1.57. The variation between clusters and within clusters is small, ranging between 0 and 6. Dublin Airport is again the exception with a figure of 15 crimes per 1000 of population. All clusters can be said to be significantly different from the national figure, except clusters five (p=0.46), nine (p=0.32) and eleven (p=0.94). The classification accounts for 15.2% of the national variance in recorded dangerous acts related crime.
  • 78. 71 Figure 24: Box plots by Cluster of drug related crime Nationally the number of drug related crimes per 1000 of population in Ireland in 2015 was 3.29. All clusters can be said to be significantly different from the national figure, except clusters three (p=0.65) and eleven (p=0.63). Drug crime is generally low with some larger outliers dragging up the figures. The city centre Sub-districts and city periphery Sub-districts of clusters two and six show the largest number of crimes and the largest intra cluster variation. In cluster two the largest rate was recorded in Store Street in Dublin with 43.22, the nearest rate to this in the cluster was Pearse Street at 23.65. The large outlier in cluster eleven was again Dublin Airport. An interesting result can be seen cluster twelve. Stradbally in county Laois which hosts concerts and summer festivals has a rate of 47.95, the largest in the country. The rate in Stradbally dwarfs the nearest rate of 1.61. The classification accounts for 29.3% of the national variance in drug related crime.
  • 80. 73 Figure 25: Box plots by Cluster of fraud related crime Nationally the number of fraud related crimes per 1000 of population in Ireland in 2015 was 1.22. The crime rates across the clusters are small and the range within the clusters is also small. Again cluster two has the most variability. Cluster two is dragged up by two Garda Sub-districts. Pearse Street (16.5) which covers many white collar areas, including the Central Bank and Store Street (13.1) which covers areas such as the IFSC. All clusters can be said to be significantly different from the national figure, except clusters three (p=0.81), five (p=0.22) and six (p=0.77). The large outlier in cluster eleven is again Dublin Airport with 49.14 crimes per 1000 population. The classification accounts for 14% of the national variance in fraud related crime.
  • 82. 75 Figure 26: Box plots by Cluster of kidnapping related crime Nationally the number of kidnapping related crimes per 1000 of population in Ireland in 2015 was 0.03. This figure represents a total of 155 recorded crimes across the country. Only two of the clusters differ significantly from the national figure. These are cluster seven and cluster eleven. These in turn were driven by one outlier each. In cluster seven Ballycroy in Mayo recorded 6 kidnapping related crimes, up 6 from 2014, giving a rate of 10.4. Dublin Airport Sub-district recorded 4 crimes, down 2 on 2014, giving a rate of 9.8. Due to the fact that the instances of kidnapping related crime being recorded nationwide were so low the classification accounts for only 1% of the national variance.
  • 83. 76 Public Order Map 26: Public Order
  • 84. 77 Figure 27: Box plots by Cluster of public order and social code crime Nationally the number of public order related crimes per 1000 of population in Ireland in 2015 was 7.25. Most clusters fall below this figure. Again cluster two has the most variability ranging from 4.94 in Irishtown to 132.83 in Pearse Street and 152.41 in Store Street. Five clusters are not significantly different from the national figure; clusters three (p=0.14), five (p=0.46), six (p=0.86), nine (p=.015), and eleven (p=0.60). The large outlier in cluster eleven is again Dublin Airport with 169.53 crimes per 1000 population. The classification accounts for 25.2% of the national variance in public order related crime.
  • 86. 79 Figure 28: Box plots by Cluster of robbery related crime Nationally the number of robbery related crimes per 1000 of population in Ireland in 2015 was 0.56. The crime rates across the clusters are small and the range within the clusters is generally small. Nationally the figure rarely gets above 2. In cluster two Pearse Street and Store Street again drag up the rate. In cluster three Tallaght crosses the 2 crimes per 1000 of population level at 2.2. In cluster six Ballymun drives the rate at 3.46. For robbery related crimes recorded all the clusters are significantly different from the national figure with p values ranging from <2e-16 to 0.035859. The classification accounts for 58.7% of the national variance in robbery related crime.
  • 88. 81 Figure 29: Box plots by Cluster of weapons related crime Nationally the number of weapons related crimes per 1000 of population in Ireland in 2015 was 0.52. The crime rates vary very little across and within clusters. Again cluster two has the most variability, and again the two Sub-districts dragging up cluster two are Store Street (5.04) and Pearse Street (6.29). Only five clusters can be said to be significantly different from the national figure. These are clusters one (p=0.038), two (p=0.0002), eleven (p=0.0002), twelve (p=0.055) and thirteen (p=0.063). The large outlier in cluster eleven is again Dublin Airport with 34.4 crimes per 1000 population. The classification accounts for only 6% of the national variance in weapons related crime.
  • 89. 82 Offences against the State, Justice or Organised Crime Map 29: State, Justice or Organised Crime
  • 90. 83 Figure 30: Box plots by Cluster of offences against the State, justice procedures or organised crime Nationally the number of fraud related crimes per 1000 of population in Ireland in 2015 was 2.41. The crime rates across the clusters are small and the range within the clusters is also small. The outliers are unsurprisingly driven by proximity to major Irish Courts Service premises. Examples include Bridewell Sub-district in Dublin (153.13) which covers the Four Courts and Anglesea Street in Cork (33.01) which covers Cork District Court. Limerick also features in outliers. In cluster three, Newcastlewest and Roxboro Road of Limerick are the two highest outliers at 6.87 and 8.74 respectively. In cluster six, Henry Street (15.79) and Manorstone Park (11.36) are also in Limerick. The classification accounts for 14% of the national variance in recorded crimes against the State, justice procedures or organised crime.
  • 91. 84 Crime Atlas Comments The classification has allowed for another aspect of crime figures to be compared. It has assisted in identifying those Garda Sub-districts which have high crime figures compared to other Sub-districts that have similar underlying geodemographics. It has also helped identify Sub-districts that may benefit from being considered separately. Of the twelve crime types shown Store Street was identified as an outlier eight times. Pearse Street was an outlier nine times, and Dublin Airport was an outlier ten times. The classification accounted for variation in crime rates with varying degrees of success as can be seen in the summary below in figure 31. However this variety should not be seen as a failure of the classification. All crimes are not created equal, just as all Sub-districts are not created equal. The classification is a general one and can be used for many other purposes. For example the classification can be used to explore if station opening times are appropriate for the population make up of Sub-districts. It can also be used to show the balance of Sub-districts that each area commander has under their control, for assistance in funding requests etc. Figure 31: Variance in crime rates explained by the clustering classification
  • 92. 85 Issues with Data and Analysis Issues with the data used have generally been described and explored as they have arisen during the literature review, data and building the classification sections of this thesis. However a number of other issues have been identified, these are discussed below. The clustering exercise described throughout this thesis takes in to account a lot of the information available and recommended by the FBI (2012). It should be noted that not all Sub-districts are created equal. Not included in the classification are several sources of information that may help compare Garda Sub-districts; opening hours, number of staff and their roles, proximity to major roads, prison populations’ etc. Incorporating more of these variables may have made the classification more accurate for crime, but perhaps too specialised. However this is a general classification that could have many uses, rather than a specific crime comparison tool. The classification was also designed to be comparable to other such general classifications created previously, such as the ones by Brunsdon et al. (2014) or Charlton et al. (1985). As such, not all the available data were included. The dominance of Dublin city centre is evident throughout the classification; cluster two has the largest variance of any type of crime by far. Out of the twelve crime types explored, Pearse Street and Store Street appear as outliers nine and eight times respectively. In cluster eleven, Dublin Airport is a large outlier in ten of the crime types. This is unsurprising due to the small settled population (there on census night), large transient daily population and increased scrutiny as befits an international airport. Removing Dublin Airport Sub-district as an anomaly may have been an option that wouldn’t adversely affect the classification with its absence. However I chose to keep it in for just that reason. The fact that it is an anomaly (and a vital national resource), the
  • 93. 86 classification results and crime figures may be another argument for more policing resources than may be justified by population figures alone. The reader and further researcher will have to draw their own conclusion, but it may make more sense to carry out the classification again, leaving out the Dublin Metropolitan Region (DMR) Sub-districts. Classifying the DMR separately would effectively give a two tier classification, one for Dublin and one for the rest of the country. This may allow a more nuanced view of Dublin Sub-districts and a more accurate view of the rest of the country than would be possible to get together. It should also be noted again that the cluster descriptions can only be a snapshot, an overall picture of an area or areas. Not everyone in cluster two is going to be young, affluent and single for example. The cluster descriptions are helpful to the reader but no classification can fully express the massive variability between people or the uniqueness of people and families. Lastly the age of the data should be considered by the reader. The Garda Sub- districts were assigned in 2013 and the census data were collected in 2011. The census data is currently collected every five years, which will allow for an update to this work in approximately 2017 (when the data are released). The opening and closing of Garda stations and the subsequent changing of Sub-district boundaries can be subject to political and administrative whim. So there is no guarantee that the same Sub-districts used in the analysis will be in operation at the time of reading.
  • 94. 87 Conclusion The Federal Bureau of Investigation (2012) warns against comparing areas on their crime figures without understanding the underlying reasons for the differences across jurisdictions. Geographic and demographic factors specific to each jurisdiction must be considered and applied if one is going to make an accurate and complete assessment of crime in that jurisdiction.... The transience of the population, its racial and ethnic makeup, its composition by age and gender, education levels, and prevalent family structures are all key factors in assessing and comprehending the crime issue. (Federal Bureau of Investigation, 2012). The aim of this thesis was to create a geodemographic classification of Ireland at the Garda Sub-district level. The classification was to be one that could be used by An Garda Síochána management, policy makers and other interested parties. The classification would allow comparison of Garda Sub-districts based on their underlying socio demographic characteristics rather than simply their physical locations in the hierarchy of An Garda Síochána. In this regard, the classification has been a success. The classification has the potential to be used or updated for many purposes. As an example of this the crime atlas section was created. It gives a general overview of crime and allows for the identification of outliers within clusters of Sub-districts with similar geodemographic characteristics for further investigation. The classification is far from perfect and the results must come with a caveat. Opening hours, staffing and equipment, station budgets and public attitudes are not taken in to account, all of which can affect crime levels (Federal Bureau of Investigation, 2012) and influence results in crime comparisons. However the classification can assist here too. It allows those with data on staffing and assets, or with control over opening hours to
  • 95. 88 investigate whether Sub-districts may require additional resources as befitting their geodemographics, based on their classification. The classification that was created and described in this thesis is not an end product, but rather a tool in the analytical toolbox to assist An Garda Síochána and interested parties. The classification can inform decision making, or assist in explaining differences between Sub-districts in a simpler way than going in to all the detail that the clusters represent. When data are available from the 2016 census it is hoped that this work can inform a reclassification based on more up to date data. As well as being an updated version of this research, it would allow for a temporal comparison of Garda Sub-district geodemographic make up, should the clustering produce different results as anticipated.
  • 96. 89 Bibliography Abbas, J., Ojo, A. & Orange, S., 2009. Geodemographics - a tool for health intelligence?. Public Health, 123(1), pp. 35-39. Alexiou, A. & Singleton, A., 2015. Geodemographic Analysis. In: C. Brunsdon & A. Singleton, eds. Geocomputation a Practical Primer. London: Sage Publications Ltd, p. 137. An Garda Siochana, 2013. Annual Policing Plan 2013. [Online] Available at: http://garda.ie/Documents/User/Annual%20Policing%20plan%202013%20%20English.pdf [Accessed 1 April 2016]. Anderson, D. et al., 2010. Statistics for Business and Economics. 2nd ed. Andover, Hampshire: Cengage Learning EMEA. Ashby, D. & Longley, P., 2005. Geocomputation, Geodemographics and Resource Allocation for Local Policing. Transactions in GIS, 9(1), pp. 53-72. Booth, C., 1903. Life and Labour of the People in London. New York: MacMillan and Co. Boyne, R., 2006. Classification. Theory, Culture & Society, 23(2-3), pp. 21-30. Brunsdon, C., Rigby, J. & Charlton, M., 2014. Ireland Census of Population 2011: A Classification of Small Areas. [Online] Available at: https://rpubs.com/chrisbrunsdon/14998 [Accessed 4 February 2016]. Central Statistics Office, 2014a. Census 2011 Boundary Files. [Online] Available at: http://www.cso.ie/en/census/census2011boundaryfiles/ [Accessed 3 February 2016]. Central Statistics Office, 2014. Census 2011 Small Area Population Statistics (SAPS). [Online] Available at: http://www.cso.ie/en/census/census2011smallareapopulationstatisticssaps/ [Accessed 4 February 2016]. Central Statistics Office, 2016. StatBank Ireland: Recorded Crime. [Online] Available at: http://www.cso.ie/px/pxeirestat/DATABASE/Eirestat/Recorded%20Crime/Recorded%20Crime_s tatbank.asp?sp=Recorded%20Crime&Planguage=0 [Accessed 1 April 2016]. Charlton, M. & Brunsdon, C., 2016. Gehlke and Biehl Revisited. Greenwich: GISRUK 2016. Charlton, M., Openshaw, S. & Wymer, C., 1985. Some New Classifications of Census Enumeration Districts in Britain: A Poor Man's ACORN. journal of Economic and Social Measurement, Volume 13, pp. 69-96. Creaner, P., 2016. Examiner of Maps (GIS) at An Garda Siochana [Interview] (8 April 2016).
  • 97. 90 Debenham, J., 2002. Understanding Geodemographic Classification: Creating The Building Blocks For An Extension. [Online] Available at: http://eprints.whiterose.ac.uk/5014/ [Accessed 1 April 2016]. Ding, C. & He, X., 2004. K-means Clustering via Principal Component Analysis. Proceedings of the twenty-first international conference on Machine learning, p. 29. Doyle, C., 2011. A Dictonary of Marketing. 3rd ed. Oxford: Oxford University Press. Dupre, J., 2006. Scientific Classification. Theory, Cluture & Society, 23(2-3), pp. 30-32. Esri, 2009. Cartogram Geoprocessing Tool version 2. [Online] Available at: http://arcscripts.esri.com/details.asp?dbid=15638 [Accessed 3 March 2016]. Federal Bureau of Investigation, 2012. Caution against Ranking. [Online] Available at: https://www.fbi.gov/about-us/cjis.ucr/cautionagainstranking.pdf [Accessed 22 April 2016]. Gale, C., Singleton, A. & Longley, P., 2015. Profiling Burglary in London using Geodemographics. Leeds, GISRUK2015. Gehlke, C. & Biehl, K., 1934. Certian Effects of Grouping Upon the Size of the Correlation Coefficient in Census Tract Material. Journal of the American Statistical Association, 29(185), pp. 169-170. Gordon, A., 1987. A Review of Hierarchical Classification. Journal of the Royal Statistical Society. Series A, 150(2), pp. 119-137. Government Publications, 2016. Bunreacht Na hEireann : Constitution of Ireland. 2016 ed. Dublin: Government Publications Office. Hansen, P. & Jaumard, B., 1997. Cluster analysis and mathmatical programming. Mathmatical Programming, Volume 79, pp. 191-215. Harris, R., Sleight, P. & Webber, R., 2005. Geodemographics, GIS and Neighbourhood Targeting. Chichester: John Wiley & Sons Ltd. Houses of the Oireachtas, 2013. Electoral (amendment) (Dail Constituencies) Act 2013. [Online] Available at: http://www.irishstatutebook.ie/eli/2013/act/7/enacted/en/print#sec6 [Accessed 6 April 2016]. Jackson, J., 1991. A User's Guide to Principal Components. New yourk: Hohn Wiley & Sons, Inc. Jolliffe, I., 2002. Principal Component Analysis. 2nd ed. London: Springer.
  • 98. 91 Jordan, A., 2015. Ireland's violent crime capitals: We reveal which counties have the highest numbers of rape and murder suspects. [Online] Available at: http://www.irishmirror.ie/news/irish-news/crime/irelands-violent-crime-capitals- reveal-5245071 [Accessed 4 April 2016]. London School of Economics and Political Science , 2012. Charles Booth Online Archive: The Survey into life and labour in London (1886-1903). [Online] Available at: http://booth.lse.ac.uk/static/b/index.html [Accessed 5 April 2016]. MacCarthaigh, S. & Phelan, S., 2014. Crime Nation: How safe is your area?. [Online] Available at: http://www.independent.ie/irish-news/crime/crime-nation-how-safe-is-your-area- 30727076.html [Accessed 4 April 2016]. Openshaw, S. & Taylor, P., 1979. A Million or so correlation coefficients, three experiments on the modifiable areal unit problem. Statistical Applications in the Spatial Sciences, Volume 127. Parker, S., Uprichard, E. & Burrows, R., 2007. CLASS PLACES AND PLACE CLASSES Geodemographics and the spatialization of class. Information, Communication and Society, 10(6), pp. 902-921. Perec, G., 1989. Penser/Classer. Paris: Hachette. Rencher, A. & Christensen, W., 2012. Methods of Multivate Analysis. 3rd ed. New Jersey: John Wiley & Sons Inc. Singleton, A. & Longley, P., 2008. Creating open source geodemographics: Refining a national classification of census output areas for applications in higher education. Papers in Regional Science, 88(3), pp. 643-666. Singleton, A. & Longley, P., 2009. Geodemographics, visualisation, and social networks in applied geography. Applied Geography, Volume 29, pp. 289-298. The R Foundation, 2015. The R Project for Statistical Computing. [Online] Available at: https://www.r-project.org/ [Accessed September 2015]. Tobler, W., 1970. A Computer Movie Simulating Urban Growth in the Detroit Region. Economic Geography, Volume 46, p. 236. Vickers, D. & Rees, P., 2007. Creating the UK National Statistics 2001 Output Area Classification. Journal of the Royal Statistical Society. Series A (Statistics in Society), 170(2), pp. 379-403. Vickers, D. & Rees, P., 2011. Ground-truthing Geodemographics. Applied Spatial Analysis, Volume 4, pp. 3-21.