Social Vulnerability Assessment Tools for Climate Change and DRR Programming
Early Mortality Report
1. M a y n o o t h U n iv e r s it y
2016
The Relationship
Between Early Mortality
and Socioeconomic
Status
A Statistical Analysis Using ArcGis
Padraig Quinn 11125900
2. Padraig Quinn
11125900
1
ExecutiveSummary
Previous studies have shown that higher premature mortality rates are associated with
lower socioeconomic status. This report examined the correlation and relationships
between factors that are considered to contribute to lower social status, together with
the Standardised Mortality Rate for premature death, relating to the county of Dublin.
The study applied various statistical analysis techniques through the use of GIS
software. The techniques included spatial clustering processes, geographically
weighted regression and python script analysis. The investigation determined that
there is a relatively strong positive relationship between lower socioeconomic status
and premature mortality. The areas that had high spatial clusters of lower social
classes also had higher rates of early mortality. When the social class variables were
tested independently against the premature death rate, they yielded results with varied
degrees of strength. The results highlight the need for social reform and significant
improvements from political governing bodies.
6. Padraig Quinn
11125900
5
1. Introductionand Context
1.1 Introduction
Early mortality rates are often associated with health conditions due to unhealthy
lifestyle choices such as smoking, alcohol intake, drug use or diet. Although these
suggestions may be true, socioeconomic status is also a recognised factor that
influences premature mortality (Erikson & Torrsander, 2008). It is perceived that
people from lower social classes tend to have less disposable income, therefore a
lower standard of living, resulting in earlier mortality rates. The opposite is associated
with higher social classes (Pensola & Martikainen, 2003). Furthermore, a lower
social class generally coincides with poorer education standards and higher levels of
unemployment, which can also have a detrimental effect on a person’s health.
Research has suggested that people with lower levels of education are more probable
to die at an earlier age and are also likely to suffer from inferior levels of health
throughout the course of their life (Higgins, et al., 2008).
This report begins with an introduction and context section outlining the topic,
followed by a detailed methodology section divided into subheadings and sections
that explain each fundamental step, supported with screen shots, graphs and tables for
visual representation and explanatory purposes. Furthermore, this section will be
followed by a results section detailing the outcome of each stage of the analysis. The
final segment will comprise of a discussion and conclusion section that further
explains and summarises the findings in the context of this report. All of the necessary
final maps are attached as separate appendices at the end of this report in PDF format.
1.2 Context
A detailed understanding of the relationship between social class and health is of
increasing importance (Higgins, et al., 2008). Many governmental policies claim to
want to eradicate social inequalities, however they are still ever present, and in
particular within the health system. Poverty is referred to as a ‘ruthless killer’ and a
variable that harmfully influences health (Murali & Oyebode, 2004). Researching
and understanding such relationships and inequalities can be time consuming and
often expensive due to multiple factors that need to be taken into consideration.
However, due to advancements in research techniques and in particular technology,
research can be completed more efficiently, significantly reducing cost and time taken
to collate all the necessary data. Through the combination of readily available
statistical datasets and the software package Arcgis, the information can be
interrogated and manipulated to produce visual representation of statistical data for
informative and explanatory purposes. Any relationships or correlations can be easily
identified and understood if present.
1.3 Datasets
The datasets utilised for this report were a combination of census data obtained from
the Central Statistics Office (CSO) and mortality data received from the Geography
Department at Maynooth University. The census data contains Ireland LA (local
authority) data and Electoral District (ED) data. The LA dataset was primarily for
county extraction together with visual representation. The ED dataset was used for
extracting socioeconomic data within the county of Dublin. The mortality data
contains premature death rates at intermediate (IA) level and was used to compare
death rates with selected social classes for the County Dublin area. Rigby, et al., n.d.
7. Padraig Quinn
11125900
6
states that data at county level is too large and EDs are too small to accurately portray
spatial variations regarding health data, so the IA scale was utilised.
1.4 Aims and Objectives
This report aimed to identify some of the socio economic factors associated with
premature mortality in the county of Dublin and discuss the correlations and
relationships if any, between certain socioeconomic factors and how they may, or may
not influence early mortality rates. The social factors were; unemployment, unskilled
workers, unskilled households and lower level education. People with an education
level of early secondary school or less were judged to be within this category. The
mortality data concerned the standardised mortality rate (SMR). The report is
primarily based on statistical analysis of the datasets within Arcgis using various tools
such as, clustering algorithms and regression analysis correlations. Additionally, two
Python computer coding scripts were incorporated into the analysis of the data. These
processes are explained in detail in the upcoming methodology section. The objective
was to display these analyses geographically, in tabular format and by means of
graphical representation for illustrative purposes, in order to readily identify any
correlations and/or relationships between premature mortality and social class in
County Dublin.
8. Padraig Quinn
11125900
7
2. Methodology
2.1 Extract Datasets
Firstly, all the required datasets as previously mentioned needed to be extracted from
zip files and added to the table of contents (TOC). Then, Arc Catalogue was opened
and all the necessary datasets were modified to a specific coordinate system. In this
case the datasets were given the Irish National Grid (TM65) projected coordinate
system. This was a very important process as it gave the map actual geographic
coordinates and was accurate in relation to scale and location which was essential
when applying measured spatial queries. From within Arc Catalogue, a blank map
was activated by selecting the globe icon (Arc Map). Arc Catalogue is better for
managing and organising multiple datasets and files. The data frame also needed to be
assigned the Irish National Grid coordinate system. This was achieved by right
clicking layers at the top of the TOC, select properties, coordinate systems, national
grids, Europe and finally Irish national grid. Once the datasets were extracted, they
were added to TOC by simply selecting them from the Arc Catalogue window and
dragging them across to the table of contents (TOC).
2.2 Extracting the Layers
The Ireland LA, ED and Mortality layers were all at national scale. However, for the
purpose of this report, only information regarding the county of Dublin was required.
In order to accomplish this, certain procedures were applied within the software. The
‘Select by Attributes’ function was employed to extract the information from each
layer concerning exclusively the county of Dublin. This process was vital as the data
solely contained within County Dublin could be further interrogated and manipulated
independently from the rest of the country. Within the ‘Select by Attributes’ option, a
‘Standard Query language’ (SQL) function needed to be implemented. For the
purpose of this exercise, the mortality data served as an example for instructive
purposes represented in Figure 1. The layer chosen was the Mortality Layer, then
County from the first list, followed by the SQL function highlighted in red in Figure
1. The ‘OR’ function was chosen to extract, as the areas of interest were a
combination of four separate subdivisions within the county. Care needed to be taken
as the ‘AND’ result would have produced a different outcome, selecting only areas
where the four sub-divisions intersected each other. The same steps were carried out
for the remaining layers, except the layer name was changed each time. Finally each
of the new temporary extracted layers were exported as shapefiles, so that they could
be queried independently from the original datasets.
9. Padraig Quinn
11125900
8
Figure 1: Select by Attributes SQL
2.3 Encoding Attribute Data
One of the bigger problems this report encountered was that the ED data and the
Mortality data were at different geographical spatial scales. The ED areas were
smaller than the Mortality (IA) areas. The ED areas had too low a population is some
cases for accurate mortality data, producing too few deaths, so the IA was developed.
The Mortality areas were equally populated areas with populations of close to 10,000
per region (Rigby, et al., n.d.). Without some form of editing, it would not have been
possible to combine and interrogate the separate datasets as one. In order to rectify
this problem, a new field was added to the ED layer called Region. The ED layer had
322 individual areas and the Mortality layer had just over 111. The Mortality layer
was made transparent and its outline thickness was set to two. See Figure 2. It was
then superimposed on top of the ED layer. This allowed the operator to visualise the
EDs that were contained within the intermediate areas (IAs). Also the identify option
was chosen from the task bar at the top of the screen. Within the identify window, the
Mortality layer was selected in the ‘Identify From’ option as evident in Figure 3. This
allowed the user to click on an ED and the information of the IA would appear in the
window. From within this list, the IA_ID was used as the area code in the newly
added field referred to as Region in the ED layer. The IA_ID was chosen, as it was
possible for an ED that was situated within an IA to use the same code to identify
both the ED and IA once amalgamated. Therefore, this code was used to match the
EDs to the IAs in the modified output table.
10. Padraig Quinn
11125900
9
Figure 2: IA Superimposedon ED Figure 3: Identify Option
Once the new field was added, it needed to be encoded with the relevant IA code for
each ED. To accomplish this, the editor toolbar was activated followed by the edit
attributes option. Then by clicking on the row in the ED attribute table, it opened a
window with all the attributes for that layer including the newly created Region field.
By clicking on the ED using the identify option, the operator could see the IA_ID for
the ED. Then the IA_ID code was simply keyed into the Region field from within the
attribute editor window. See Figure 4 below. When finished the Stop Editing option
was chosen and the edits were saved.
Figure 4: Populating Ed Attribute Table with IA Codes in Edit Mode
2.4 Dissolve EDs into IAs
After successful completion of the encoding (matching) procedure, the EDs could
then be amalgamated into the relevant IAs using a Geoprocessing tool known as
Dissolve. The dissolve feature can be referred to as a reclassification of vector data so
that polygons can be arranged into higher forms (Heywood, et al., 2006). Once the
11. Padraig Quinn
11125900
10
tool was selected, it had to be modified in order to successfully attach and merge the
smaller scale ED data into the larger scale IA data. In the input feature, the ED layer
was chosen and the dissolved field was the newly created Region field. In the
statistics field, the required variables discussed in the previous section were selected.
The statistic type selected was SUM, as the total numbers for the each of the
combined EDs within the IAs were required. This gave a total count for each variable
calculated from the combined number of EDs within the IA. Figure 5 depicts the
Dissolve tool. The output file was saved as Diss_Ed_IA.
Figure 5: Dissolve Tool
2.5 Spatial Join
The next step was to apply a spatial join using the newly created dissolved layer and
the mortality layer. Up to this point the data was at the same scale geographically, but
they were still two separate datasets. The two datasets were joined on the basis of
their attribute table values, as the Region field in the dissolved layer was a match for
the IA_ID field in the mortality layer. The dissolved layer (Diss_Ed_IA) was selected,
and by right clicking on it in the TOC and selecting the ‘Join Data’ option from within
the ‘Joins and Relates’ selection, opened the ‘Join Data’ window. The join was
configured exactly as depicted below. The important aspect to note was that the join
was based on the Region field in the dissolved layer and the IA_ID field in the
Mortality layer. The result was a single attribute table containing both datasets at the
same geographic scale. They could then be queried and manipulated as a single
complete dataset. The process is displayed in Figure 6 below.
12. Padraig Quinn
11125900
11
Figure 6: Spatial Join
2.6 Cluster Analysis
Cluster and Outlier analyses, referred to as Anselin Local Morans I, were carried out
on all of the mentioned variables (SMR, unemployment, unskilled workers, unskilled
households and poor education). These analyses were administered in order to
visualise any areas of high or low clustering of the chosen variables and also to
identify any outliers. This technique also assisted in determining any patterns or
similarities of clusters between the selected variables. The cluster analysis tool was
implemented from within the spatial statistics tool box. As with other tool functions, it
needed to be modified in the set up window. The input feature utilised was the
amalgamated or dissolved layer and for the purpose of this description, the SMR
variable was chosen as the input variable for the analysis and is identifiable below. As
expected, the same measures were repeated for the remaining variables.
Figure 7: ClusterAnalysis
13. Padraig Quinn
11125900
12
2.7 Geographically Weighted Regression
Regression analysis is used to examine, explore and predict spatial relationships.
Geographically Weighted Regression (GWR) is a powerful method used for
examining and estimating linear relationships. It provides a local model of the
variable that is being examined or predicted by fitting a regression equation to each
feature in the dataset. It is a sophisticated basis to quantify and dissect spatial patterns
across a study area (Legg & Bowe, 2009). For the purpose of this report GWR was
enforced to analyse the hypothesis of higher premature death rates in lower social
classes and the strength of these relationships. Again, this feature was available in the
spatial analyst toolbox. This feature provided detailed and flexible analysis of the
data. The input feature used was the recent dissolved and amalgamated layer and the
dependant feature was the SMR under 75 variable. Additionally, explanatory
variables were added, firstly in pairs, then as a combination of all the variables. The
regression model used in this explanatory example was the SMR as the dependant
variable and unemployment and unskilled workers as the explanatory variables. See
Figure 8 for visual representation.
Figure 8: Geographically Weighted Regression Tool
2.8 Python Script for Identifying Correlations
A python script was employed to further investigate correlations between the
variables and death rates. Similar to the regression analysis, it was applied to examine
the relationship between different social factors and premature death rates. However,
in this case the python script observed the direct relationship between a single
variable and the SMR and also the strength of that relationship or correlation. It was
also used to create scatter plot graphs with lines of best suit, which better identify any
correlation or variance within the model. Two scripts were implemented. The first
script created a table from the statistics and the second script created graphs from the
table. Both are attached in the appendices section of this report.
14. Padraig Quinn
11125900
13
3. Results
3.1 Cluster Analyses
The Clustering Analysis (Local Morans I Statistic) was used to essentially measure z
values (similarity of clustering) and p values (spatial significance). This function
allowed the user to identify areas of high and low clustering of a variable, but also
areas that were surrounded by contradictory values, either high or low. The grey areas
represented areas that did not contain statistically significant clustering. The black
areas (High-High or HH) indicated areas of high values clustered closely together.
The blue areas (Low-Low or LL) were the opposite of the black areas, as they yielded
low values clustered closely together. The orange areas (High-Low or HL) were the
opposite of clustering, referred to as outliers. They were a result of high values
surrounded by low values. However, they were non-applicable for the outcome of this
study. The white areas (Low-High or LH) were a result of low values surrounded by
high values.
The SMR under 75 (Premature Mortality) clustering analysis produced greatest levels
of high score (HH) clustering in areas in the Dublin City and South Dublin regions of
the county. Also, it generated statistically significant clusters of low frequency SMR
under 75 (LL) in the Dun Laoighaire-Rathdown region of the county. The LH outlier
areas were predominantly situated in the Fingal jurisdiction of the county situated to
the west, illustrating pockets of low mortality rates surrounded by higher rates. The
results suggested that more affluent LH areas such as Castleknock, had longer life
expectancy than some of the more deprived surrounding areas in Blanchardstown for
example. Results are evident in Figure 9.
Figure 9: Premature SMR Clustering
The unemployment analysis generated the greatest levels of high score clustering
(HH) of unemployment in the South Dublin and Fingal districts of the county (with
the exception of one Dublin City IA) and clustering of low levels of unemployment in
the wealthier Dunlaoighaire-Rathdown territory. The remainder of the county was
deemed to have no significant clustering. The results are displayed in figure 10.
15. Padraig Quinn
11125900
14
Figure 10: Unemployment Clustering
The unskilled worker analysis created the greatest amount of high level-high value
(HH) unskilled workers clustered within the South Dublin, Dublin City and Fingal
areas. There were higher intensity clusters of people who were not considered
unskilled in the Dunlaoighaire-Rathdown locality and also a pocket of low value
clusters in the Fingal region, namely Howth. Furthermore, there was also an outlier
(LH) area indicating a vicinity of skilled workers close to, or surrounded by unskilled
workers in the Fingal quarter, situated on the western fringe of the county boundary.
See figure 11.
Figure 11: UnskilledWorkerClustering
The unskilled households’ results fashioned similar outcome. However, this time
there was more HH clustering in the Dublin City jurisdiction with LH outliers in
pockets around the territory also. For this analysis, Howth was not considered
significant. These can be identified in Figure 12.
16. Padraig Quinn
11125900
15
Figure 12: UnskilledHouseholds Clustering
The Poor Education clustering formed a similar end result, with the HH clusters in the
Western South Dublin and Fingal zones, and LL clustering in the Dunlaoighaire-
Rathdown constituency.
All the results appeared to more negative outcomes for known disadvantaged areas
and more positive results for known prosperous regions.
Figure 13: Poor Education Clustering
3.2 Geographically Weighted Regression
The geographically weighted regression was calculated by generating regression
models using the premature SMR statistics as the dependant variable a combination of
pairs of the selected lower social class variables as the explanatory variables. Also a
combined model was created including all of the low social class variables versus the
premature SMR variable. For the scope of this report, the adjusted R2 value was of
interest to the user, as this was the factor that determined how strong or weak the
relationship was between the selected variables and the premature death rate was. It is
referred to as the likeness or goodness of fit and it varies between 0.0 and 1, with
17. Padraig Quinn
11125900
16
higher values being favoured (Legg & Bowe, 2009). The results are illustrated in the
tables below.
Table 1 had an adjusted R2 value of just over 0.51. This meant that the model could
explain just over 51 percent of variation in the model, or the two lower social class
variables (Unemployment and Unskilled Workers) accounted for 51 percent of the
variation. This meant that there was evidence of a relatively strong relationship or
correlation between the two variables and premature mortality. Table 3 had the
weakest result and Table 6 the strongest of the pairings. Also, when all of the
variables were combined, the results showed that there was a relatively strong
relationship between all of the lower social class variables and premature mortality.
The result had an adjusted R2 value of slightly over 0.55. See Table 7.
Table 1: GWR Unemployment and UnskilledWorkers
Table 2: Unemployment and Unskilled Households
Table 3: Unemployment and Poor Education
18. Padraig Quinn
11125900
17
Table 4: UnskilledWorkers and UnskilledHouseholds
Table 5: UnskilledWorkers and Poor Education
Table 6: UnskilledHouseholds andPoor Education
Table 7: GWR All Variables
19. Padraig Quinn
11125900
18
3.3 Python Script Analysis
The python script analysis was also implemented to examine the correlation between
premature mortality rates and selected lower social class determining variables.
However, this time only one variable was selected per model with the premature
mortality rate. It produced detailed easy to interpret graphs with a visible line of best
fit. The results depicted a relatively strong relationship between the variables and the
premature mortality rate. See Figures 14 to 17 below. The points were scattered close
to the line of best fit suggesting that there was indeed a relationship. The closer and
more dense the scattering to the line, the stronger the correlation. Additionally, as the
premature mortality rate increased, so too did the quantity of the variable being
examined, illustrating that there was a positive relationship between the variables.
Positive meant for Figure 14 for example that, a higher dearh rate corresponed with
higher amounts of unemployment, meaning the relationship was positive. The
statistics of the separate correlations are also recorded below in Table 8. Identical to
the previous sub-section, the decimal score represented a percentage.
Figure 14: Premature Death Rate Versus Unemployment
22. Padraig Quinn
11125900
21
4. Discussionand Conclusion
4.1 Discussion
The results produced from the several analyses suggested that there is a relationship
between higher rates of premature mortality and lower social class factors. The
findings correspond with previous research (Pensola & Martikainen, 2003) and
(Erikson & Torrsander, 2008) that premature mortality tends to increase with lower
social classes. The relationship or correlation was relatively strong and in some cases
more so than others, particularly unskilled households and early mortality and
unskilled workers and early mortality, see table 8.
The clustering analysis provided valuable insight as regards where the high and low
clusters of each variable were situated. The majority of clustering relating to low
social class variables occurred in areas known to be less affluent such as Jobstown in
the South Dublin district and Ballymun in the Dublin City division of the county. This
also corresponded with the high clusters of premature mortality. In the more affluent
Dunlaoighaire-Rathdown constituency, the opposite occurred. These analyses further
bolster previous research relating to mortality and social class (Rigby, et al., n.d.).
Findings are evident in Figure 9 to Figure 13. Furthermore, areas that experienced
high clustering indicated that this was solely a result of the high values of the variable
being analysed, independent of highly populated areas. Highly populated areas were
not a factor for IAs, as they all had roughly the same populations (close to 10,000).
This proved that the IA was in fact an accurate means to examine the individual
regions of the county, limiting other factors that may have affected what was being
modelled in the model.
The evidence generated from the GWR analyses and the python script analyses also
highlight the positive correlations between the socioeconomic variables and early
mortality. They support the observations associated with higher early mortality rates
and poor education (Higgins, et al., 2008). The figures are clearly evident in Table 8.
Also when poor education was combined in pairs with the other variables and
likewise with all of the variables, the relationship was apparent. See Tables 3, 5, 6 and
7. Also they support the reasoning that unemployment and low skilled classes
contribute to early mortality, which in turn may lead to poverty. The evidence
reiterated previous studies referring to poverty as a killer (Murali & Oyebode, 2004).
4.2 Conclusion
The report has provided a valuable insight into the complex study of linking
socioeconomic status and premature mortality. Several issues needed to be addressed
before the analysis could even begin. Issues such as; combining, merging and
dissolving datasets that were of different geographical scale, choosing variables that
contributed most to lower social status and selecting analysis techniques that best
addressed and represented the issue being examined. Several separate analysis
techniques were implemented to reduce as much of the inconsistencies as possible and
to make the modelling outcomes as plausible as possible. These complex techniques
combined, contrasted and compared all the relevant variables against premature
mortality, producing in all cases, a relatively strong positive correlation between
premature mortality and low socioeconomic status.
However, for better results, further datasets could have been included, but were
outside the scope of this study. An interesting dataset that might have produced more
specific results is the deprivation index. The dataset is composed of small area (SA)
data, and may have produced more detailed results when merged and dissolved with
23. Padraig Quinn
11125900
22
the larger IA Mortality data. The smaller data provides larger variations as a whole as
several more jurisdictions are incorporated into the dataset.
Also, regarding the newly generated dissolved ED to IA layer, the study perhaps
could have normalised the social class variables. The statistical data they contained
related to the SUM or total numbers of people for each variable, inclusive of people
over the age of 75. It would have been better to have established a cut-off point, as to
only include people from within each variable that were under seventy five when
correlating them with the premature mortality rate. It would have made the analysis
slightly more accurate and precise.
Nevertheless, this study has reiterated the fact that lower socioeconomic status does
have a detrimental effect on mortality rates. The research undertaken in this report
was detailed, interrogative and showed how strong the individual relationships were
between the individual variables and the early mortality rate, and also the relationship
between the early mortality rate and combinations of the socioeconomic variables.
This study proved a success and could be used as a valuable asset for decision makers
in the future regarding our fragile health system and futile housing and employment
situation.
24. Padraig Quinn
11125900
23
Bibliography
Erikson, R. & Torrsander, J., 2008. Social Class and Cause of Death. Wuropean
Journal of Public Health, 18(5), pp. 473-478.
Heywood, I., Cornelius, S. & Carver, S., 2006. An Introduction to Geographical
Information Systems. 3rd ed. Essex: Pearson.
Higgins, C., Lavin, T. & Metcalfe, O., 2008. Health Impacts on Education: A Review,
Belfast: The Institute of Public Health in Ireland.
Legg, R. & Bowe, T., 2009. Applying Geographically Weighted Regression to a Real
Estate Problem, Michigan: Michigan University.
Murali, V. & Oyebode, F., 2004. Poverty, Social Inequality and Mental Health.
Advances in Psychiatric Treatment, 10(3), pp. 216-224.
Pensola, T. & Martikainen, P., 2003. Cumulative Social Class and Mortality from
Various Causes of Adult Men. Journal of Epidemiology and Community Health,
57(9), pp. 745-751.
Rigby, J. E. et al., n.d. Towards A Geography of Health Inequalities in Ireland, s.l.:
Draft.