SlideShare a Scribd company logo
1 of 26
M a y n o o t h U n iv e r s it y
2016
The Relationship
Between Early Mortality
and Socioeconomic
Status
A Statistical Analysis Using ArcGis
Padraig Quinn 11125900
Padraig Quinn
11125900
1
ExecutiveSummary
Previous studies have shown that higher premature mortality rates are associated with
lower socioeconomic status. This report examined the correlation and relationships
between factors that are considered to contribute to lower social status, together with
the Standardised Mortality Rate for premature death, relating to the county of Dublin.
The study applied various statistical analysis techniques through the use of GIS
software. The techniques included spatial clustering processes, geographically
weighted regression and python script analysis. The investigation determined that
there is a relatively strong positive relationship between lower socioeconomic status
and premature mortality. The areas that had high spatial clusters of lower social
classes also had higher rates of early mortality. When the social class variables were
tested independently against the premature death rate, they yielded results with varied
degrees of strength. The results highlight the need for social reform and significant
improvements from political governing bodies.
Padraig Quinn
11125900
2
EXECUTIVE SUMMARY ............................................................................................................................................1
TABLE OF FIGURES ...................................................................................................................................................3
TABLE OF TABLES......................................................................................................................................................4
1. INTRODUCTION AND CONTEXT ........................................................................................................................5
1.1 Introduction ...........................................................................................................................................5
1.2 Context...................................................................................................................................................5
1.3 Datasets..................................................................................................................................................5
1.4 Aims and Objectives...............................................................................................................................6
2. METHODOLOGY ...................................................................................................................................................7
2.1 Extract Datasets .....................................................................................................................................7
2.2 Extracting the Layers..............................................................................................................................7
2.3 Encoding Attribute Data.........................................................................................................................8
2.4 Dissolve EDs into IAs ..............................................................................................................................9
2.5 Spatial Join ...........................................................................................................................................10
2.6 Cluster Analysis....................................................................................................................................11
2.7 GeographicallyWeighted Regression..................................................................................................12
2.8 Python Script for Identifying Correlations ...........................................................................................12
3. RESULTS ..............................................................................................................................................................13
3.1 Cluster Analyses...................................................................................................................................13
3.2 GeographicallyWeighted Regression..................................................................................................15
3.3 Python Script Analysis..........................................................................................................................18
4. DISCUSSION AND CONCLUSION ....................................................................................................................21
4.1 Discussion.............................................................................................................................................21
4.2 Conclusion............................................................................................................................................21
BIBLIOGRAPHY .......................................................................................................................................................23
APPENDICES............................................................................................................................................................24
Padraig Quinn
11125900
3
Table of Figures
FIGURE 1:SELECTBY ATTRIBUTES SQL............................................................................................................................8
FIGURE 2: IASUPERIMPOSED ON ED FIGURE 3: IDENTIFY OPTION.............................................................................9
FIGURE 4:POPULATING ED ATTRIBUTE TABLEWITH IACODES IN EDIT MODE ..................................................................9
FIGURE 5:DISSOLVE TOOL...........................................................................................................................................10
FIGURE 6:SPATIAL JOIN...............................................................................................................................................11
FIGURE 7:CLUSTER ANALYSIS ......................................................................................................................................11
FIGURE 8:GEOGRAPHICALLY WEIGHTED REGRESSION TOOL .........................................................................................12
FIGURE 9:PREMATURE SMR CLUSTERING...................................................................................................................13
FIGURE 10:UNEMPLOYMENT CLUSTERING...................................................................................................................14
FIGURE 11:UNSKILLED WORKER CLUSTERING..............................................................................................................14
FIGURE 12:UNSKILLED HOUSEHOLDS CLUSTERING.......................................................................................................15
FIGURE 13:POOR EDUCATION CLUSTERING.................................................................................................................15
FIGURE 14:PREMATURE DEATH RATE VERSUS UNEMPLOYMENT..................................................................................18
FIGURE 15:PREMATURE DEATH RATE VS UNSKILLED WORKERS...................................................................................19
FIGURE 16:PREMATURE DEATH RATE VERSUS UNSKILLED HOUSEHOLDS......................................................................19
FIGURE 17:PREMATURE DEATH RATE VERSUS POOR EDUCATION ................................................................................20
Padraig Quinn
11125900
4
Table of Tables
TABLE 1: GWR UNEMPLOYMENTAND UNSKILLED WORKERS.......................................................................................16
TABLE 2: UNEMPLOYMENTAND UNSKILLED HOUSEHOLDS............................................................................................16
TABLE 3: UNEMPLOYMENTAND POOR EDUCATION ......................................................................................................16
TABLE 4: UNSKILLED WORKERS AND UNSKILLED HOUSEHOLDS......................................................................................17
TABLE 5: UNSKILLED WORKERS AND POOR EDUCATION................................................................................................17
TABLE 6: UNSKILLED HOUSEHOLDS AND POOR EDUCATION...........................................................................................17
TABLE 7: GWR ALL VARIABLES....................................................................................................................................17
TABLE 8:CORRELATION STATISTICS PYTHON SCRIPT .....................................................................................................20
Padraig Quinn
11125900
5
1. Introductionand Context
1.1 Introduction
Early mortality rates are often associated with health conditions due to unhealthy
lifestyle choices such as smoking, alcohol intake, drug use or diet. Although these
suggestions may be true, socioeconomic status is also a recognised factor that
influences premature mortality (Erikson & Torrsander, 2008). It is perceived that
people from lower social classes tend to have less disposable income, therefore a
lower standard of living, resulting in earlier mortality rates. The opposite is associated
with higher social classes (Pensola & Martikainen, 2003). Furthermore, a lower
social class generally coincides with poorer education standards and higher levels of
unemployment, which can also have a detrimental effect on a person’s health.
Research has suggested that people with lower levels of education are more probable
to die at an earlier age and are also likely to suffer from inferior levels of health
throughout the course of their life (Higgins, et al., 2008).
This report begins with an introduction and context section outlining the topic,
followed by a detailed methodology section divided into subheadings and sections
that explain each fundamental step, supported with screen shots, graphs and tables for
visual representation and explanatory purposes. Furthermore, this section will be
followed by a results section detailing the outcome of each stage of the analysis. The
final segment will comprise of a discussion and conclusion section that further
explains and summarises the findings in the context of this report. All of the necessary
final maps are attached as separate appendices at the end of this report in PDF format.
1.2 Context
A detailed understanding of the relationship between social class and health is of
increasing importance (Higgins, et al., 2008). Many governmental policies claim to
want to eradicate social inequalities, however they are still ever present, and in
particular within the health system. Poverty is referred to as a ‘ruthless killer’ and a
variable that harmfully influences health (Murali & Oyebode, 2004). Researching
and understanding such relationships and inequalities can be time consuming and
often expensive due to multiple factors that need to be taken into consideration.
However, due to advancements in research techniques and in particular technology,
research can be completed more efficiently, significantly reducing cost and time taken
to collate all the necessary data. Through the combination of readily available
statistical datasets and the software package Arcgis, the information can be
interrogated and manipulated to produce visual representation of statistical data for
informative and explanatory purposes. Any relationships or correlations can be easily
identified and understood if present.
1.3 Datasets
The datasets utilised for this report were a combination of census data obtained from
the Central Statistics Office (CSO) and mortality data received from the Geography
Department at Maynooth University. The census data contains Ireland LA (local
authority) data and Electoral District (ED) data. The LA dataset was primarily for
county extraction together with visual representation. The ED dataset was used for
extracting socioeconomic data within the county of Dublin. The mortality data
contains premature death rates at intermediate (IA) level and was used to compare
death rates with selected social classes for the County Dublin area. Rigby, et al., n.d.
Padraig Quinn
11125900
6
states that data at county level is too large and EDs are too small to accurately portray
spatial variations regarding health data, so the IA scale was utilised.
1.4 Aims and Objectives
This report aimed to identify some of the socio economic factors associated with
premature mortality in the county of Dublin and discuss the correlations and
relationships if any, between certain socioeconomic factors and how they may, or may
not influence early mortality rates. The social factors were; unemployment, unskilled
workers, unskilled households and lower level education. People with an education
level of early secondary school or less were judged to be within this category. The
mortality data concerned the standardised mortality rate (SMR). The report is
primarily based on statistical analysis of the datasets within Arcgis using various tools
such as, clustering algorithms and regression analysis correlations. Additionally, two
Python computer coding scripts were incorporated into the analysis of the data. These
processes are explained in detail in the upcoming methodology section. The objective
was to display these analyses geographically, in tabular format and by means of
graphical representation for illustrative purposes, in order to readily identify any
correlations and/or relationships between premature mortality and social class in
County Dublin.
Padraig Quinn
11125900
7
2. Methodology
2.1 Extract Datasets
Firstly, all the required datasets as previously mentioned needed to be extracted from
zip files and added to the table of contents (TOC). Then, Arc Catalogue was opened
and all the necessary datasets were modified to a specific coordinate system. In this
case the datasets were given the Irish National Grid (TM65) projected coordinate
system. This was a very important process as it gave the map actual geographic
coordinates and was accurate in relation to scale and location which was essential
when applying measured spatial queries. From within Arc Catalogue, a blank map
was activated by selecting the globe icon (Arc Map). Arc Catalogue is better for
managing and organising multiple datasets and files. The data frame also needed to be
assigned the Irish National Grid coordinate system. This was achieved by right
clicking layers at the top of the TOC, select properties, coordinate systems, national
grids, Europe and finally Irish national grid. Once the datasets were extracted, they
were added to TOC by simply selecting them from the Arc Catalogue window and
dragging them across to the table of contents (TOC).
2.2 Extracting the Layers
The Ireland LA, ED and Mortality layers were all at national scale. However, for the
purpose of this report, only information regarding the county of Dublin was required.
In order to accomplish this, certain procedures were applied within the software. The
‘Select by Attributes’ function was employed to extract the information from each
layer concerning exclusively the county of Dublin. This process was vital as the data
solely contained within County Dublin could be further interrogated and manipulated
independently from the rest of the country. Within the ‘Select by Attributes’ option, a
‘Standard Query language’ (SQL) function needed to be implemented. For the
purpose of this exercise, the mortality data served as an example for instructive
purposes represented in Figure 1. The layer chosen was the Mortality Layer, then
County from the first list, followed by the SQL function highlighted in red in Figure
1. The ‘OR’ function was chosen to extract, as the areas of interest were a
combination of four separate subdivisions within the county. Care needed to be taken
as the ‘AND’ result would have produced a different outcome, selecting only areas
where the four sub-divisions intersected each other. The same steps were carried out
for the remaining layers, except the layer name was changed each time. Finally each
of the new temporary extracted layers were exported as shapefiles, so that they could
be queried independently from the original datasets.
Padraig Quinn
11125900
8
Figure 1: Select by Attributes SQL
2.3 Encoding Attribute Data
One of the bigger problems this report encountered was that the ED data and the
Mortality data were at different geographical spatial scales. The ED areas were
smaller than the Mortality (IA) areas. The ED areas had too low a population is some
cases for accurate mortality data, producing too few deaths, so the IA was developed.
The Mortality areas were equally populated areas with populations of close to 10,000
per region (Rigby, et al., n.d.). Without some form of editing, it would not have been
possible to combine and interrogate the separate datasets as one. In order to rectify
this problem, a new field was added to the ED layer called Region. The ED layer had
322 individual areas and the Mortality layer had just over 111. The Mortality layer
was made transparent and its outline thickness was set to two. See Figure 2. It was
then superimposed on top of the ED layer. This allowed the operator to visualise the
EDs that were contained within the intermediate areas (IAs). Also the identify option
was chosen from the task bar at the top of the screen. Within the identify window, the
Mortality layer was selected in the ‘Identify From’ option as evident in Figure 3. This
allowed the user to click on an ED and the information of the IA would appear in the
window. From within this list, the IA_ID was used as the area code in the newly
added field referred to as Region in the ED layer. The IA_ID was chosen, as it was
possible for an ED that was situated within an IA to use the same code to identify
both the ED and IA once amalgamated. Therefore, this code was used to match the
EDs to the IAs in the modified output table.
Padraig Quinn
11125900
9
Figure 2: IA Superimposedon ED Figure 3: Identify Option
Once the new field was added, it needed to be encoded with the relevant IA code for
each ED. To accomplish this, the editor toolbar was activated followed by the edit
attributes option. Then by clicking on the row in the ED attribute table, it opened a
window with all the attributes for that layer including the newly created Region field.
By clicking on the ED using the identify option, the operator could see the IA_ID for
the ED. Then the IA_ID code was simply keyed into the Region field from within the
attribute editor window. See Figure 4 below. When finished the Stop Editing option
was chosen and the edits were saved.
Figure 4: Populating Ed Attribute Table with IA Codes in Edit Mode
2.4 Dissolve EDs into IAs
After successful completion of the encoding (matching) procedure, the EDs could
then be amalgamated into the relevant IAs using a Geoprocessing tool known as
Dissolve. The dissolve feature can be referred to as a reclassification of vector data so
that polygons can be arranged into higher forms (Heywood, et al., 2006). Once the
Padraig Quinn
11125900
10
tool was selected, it had to be modified in order to successfully attach and merge the
smaller scale ED data into the larger scale IA data. In the input feature, the ED layer
was chosen and the dissolved field was the newly created Region field. In the
statistics field, the required variables discussed in the previous section were selected.
The statistic type selected was SUM, as the total numbers for the each of the
combined EDs within the IAs were required. This gave a total count for each variable
calculated from the combined number of EDs within the IA. Figure 5 depicts the
Dissolve tool. The output file was saved as Diss_Ed_IA.
Figure 5: Dissolve Tool
2.5 Spatial Join
The next step was to apply a spatial join using the newly created dissolved layer and
the mortality layer. Up to this point the data was at the same scale geographically, but
they were still two separate datasets. The two datasets were joined on the basis of
their attribute table values, as the Region field in the dissolved layer was a match for
the IA_ID field in the mortality layer. The dissolved layer (Diss_Ed_IA) was selected,
and by right clicking on it in the TOC and selecting the ‘Join Data’ option from within
the ‘Joins and Relates’ selection, opened the ‘Join Data’ window. The join was
configured exactly as depicted below. The important aspect to note was that the join
was based on the Region field in the dissolved layer and the IA_ID field in the
Mortality layer. The result was a single attribute table containing both datasets at the
same geographic scale. They could then be queried and manipulated as a single
complete dataset. The process is displayed in Figure 6 below.
Padraig Quinn
11125900
11
Figure 6: Spatial Join
2.6 Cluster Analysis
Cluster and Outlier analyses, referred to as Anselin Local Morans I, were carried out
on all of the mentioned variables (SMR, unemployment, unskilled workers, unskilled
households and poor education). These analyses were administered in order to
visualise any areas of high or low clustering of the chosen variables and also to
identify any outliers. This technique also assisted in determining any patterns or
similarities of clusters between the selected variables. The cluster analysis tool was
implemented from within the spatial statistics tool box. As with other tool functions, it
needed to be modified in the set up window. The input feature utilised was the
amalgamated or dissolved layer and for the purpose of this description, the SMR
variable was chosen as the input variable for the analysis and is identifiable below. As
expected, the same measures were repeated for the remaining variables.
Figure 7: ClusterAnalysis
Padraig Quinn
11125900
12
2.7 Geographically Weighted Regression
Regression analysis is used to examine, explore and predict spatial relationships.
Geographically Weighted Regression (GWR) is a powerful method used for
examining and estimating linear relationships. It provides a local model of the
variable that is being examined or predicted by fitting a regression equation to each
feature in the dataset. It is a sophisticated basis to quantify and dissect spatial patterns
across a study area (Legg & Bowe, 2009). For the purpose of this report GWR was
enforced to analyse the hypothesis of higher premature death rates in lower social
classes and the strength of these relationships. Again, this feature was available in the
spatial analyst toolbox. This feature provided detailed and flexible analysis of the
data. The input feature used was the recent dissolved and amalgamated layer and the
dependant feature was the SMR under 75 variable. Additionally, explanatory
variables were added, firstly in pairs, then as a combination of all the variables. The
regression model used in this explanatory example was the SMR as the dependant
variable and unemployment and unskilled workers as the explanatory variables. See
Figure 8 for visual representation.
Figure 8: Geographically Weighted Regression Tool
2.8 Python Script for Identifying Correlations
A python script was employed to further investigate correlations between the
variables and death rates. Similar to the regression analysis, it was applied to examine
the relationship between different social factors and premature death rates. However,
in this case the python script observed the direct relationship between a single
variable and the SMR and also the strength of that relationship or correlation. It was
also used to create scatter plot graphs with lines of best suit, which better identify any
correlation or variance within the model. Two scripts were implemented. The first
script created a table from the statistics and the second script created graphs from the
table. Both are attached in the appendices section of this report.
Padraig Quinn
11125900
13
3. Results
3.1 Cluster Analyses
The Clustering Analysis (Local Morans I Statistic) was used to essentially measure z
values (similarity of clustering) and p values (spatial significance). This function
allowed the user to identify areas of high and low clustering of a variable, but also
areas that were surrounded by contradictory values, either high or low. The grey areas
represented areas that did not contain statistically significant clustering. The black
areas (High-High or HH) indicated areas of high values clustered closely together.
The blue areas (Low-Low or LL) were the opposite of the black areas, as they yielded
low values clustered closely together. The orange areas (High-Low or HL) were the
opposite of clustering, referred to as outliers. They were a result of high values
surrounded by low values. However, they were non-applicable for the outcome of this
study. The white areas (Low-High or LH) were a result of low values surrounded by
high values.
The SMR under 75 (Premature Mortality) clustering analysis produced greatest levels
of high score (HH) clustering in areas in the Dublin City and South Dublin regions of
the county. Also, it generated statistically significant clusters of low frequency SMR
under 75 (LL) in the Dun Laoighaire-Rathdown region of the county. The LH outlier
areas were predominantly situated in the Fingal jurisdiction of the county situated to
the west, illustrating pockets of low mortality rates surrounded by higher rates. The
results suggested that more affluent LH areas such as Castleknock, had longer life
expectancy than some of the more deprived surrounding areas in Blanchardstown for
example. Results are evident in Figure 9.
Figure 9: Premature SMR Clustering
The unemployment analysis generated the greatest levels of high score clustering
(HH) of unemployment in the South Dublin and Fingal districts of the county (with
the exception of one Dublin City IA) and clustering of low levels of unemployment in
the wealthier Dunlaoighaire-Rathdown territory. The remainder of the county was
deemed to have no significant clustering. The results are displayed in figure 10.
Padraig Quinn
11125900
14
Figure 10: Unemployment Clustering
The unskilled worker analysis created the greatest amount of high level-high value
(HH) unskilled workers clustered within the South Dublin, Dublin City and Fingal
areas. There were higher intensity clusters of people who were not considered
unskilled in the Dunlaoighaire-Rathdown locality and also a pocket of low value
clusters in the Fingal region, namely Howth. Furthermore, there was also an outlier
(LH) area indicating a vicinity of skilled workers close to, or surrounded by unskilled
workers in the Fingal quarter, situated on the western fringe of the county boundary.
See figure 11.
Figure 11: UnskilledWorkerClustering
The unskilled households’ results fashioned similar outcome. However, this time
there was more HH clustering in the Dublin City jurisdiction with LH outliers in
pockets around the territory also. For this analysis, Howth was not considered
significant. These can be identified in Figure 12.
Padraig Quinn
11125900
15
Figure 12: UnskilledHouseholds Clustering
The Poor Education clustering formed a similar end result, with the HH clusters in the
Western South Dublin and Fingal zones, and LL clustering in the Dunlaoighaire-
Rathdown constituency.
All the results appeared to more negative outcomes for known disadvantaged areas
and more positive results for known prosperous regions.
Figure 13: Poor Education Clustering
3.2 Geographically Weighted Regression
The geographically weighted regression was calculated by generating regression
models using the premature SMR statistics as the dependant variable a combination of
pairs of the selected lower social class variables as the explanatory variables. Also a
combined model was created including all of the low social class variables versus the
premature SMR variable. For the scope of this report, the adjusted R2 value was of
interest to the user, as this was the factor that determined how strong or weak the
relationship was between the selected variables and the premature death rate was. It is
referred to as the likeness or goodness of fit and it varies between 0.0 and 1, with
Padraig Quinn
11125900
16
higher values being favoured (Legg & Bowe, 2009). The results are illustrated in the
tables below.
Table 1 had an adjusted R2 value of just over 0.51. This meant that the model could
explain just over 51 percent of variation in the model, or the two lower social class
variables (Unemployment and Unskilled Workers) accounted for 51 percent of the
variation. This meant that there was evidence of a relatively strong relationship or
correlation between the two variables and premature mortality. Table 3 had the
weakest result and Table 6 the strongest of the pairings. Also, when all of the
variables were combined, the results showed that there was a relatively strong
relationship between all of the lower social class variables and premature mortality.
The result had an adjusted R2 value of slightly over 0.55. See Table 7.
Table 1: GWR Unemployment and UnskilledWorkers
Table 2: Unemployment and Unskilled Households
Table 3: Unemployment and Poor Education
Padraig Quinn
11125900
17
Table 4: UnskilledWorkers and UnskilledHouseholds
Table 5: UnskilledWorkers and Poor Education
Table 6: UnskilledHouseholds andPoor Education
Table 7: GWR All Variables
Padraig Quinn
11125900
18
3.3 Python Script Analysis
The python script analysis was also implemented to examine the correlation between
premature mortality rates and selected lower social class determining variables.
However, this time only one variable was selected per model with the premature
mortality rate. It produced detailed easy to interpret graphs with a visible line of best
fit. The results depicted a relatively strong relationship between the variables and the
premature mortality rate. See Figures 14 to 17 below. The points were scattered close
to the line of best fit suggesting that there was indeed a relationship. The closer and
more dense the scattering to the line, the stronger the correlation. Additionally, as the
premature mortality rate increased, so too did the quantity of the variable being
examined, illustrating that there was a positive relationship between the variables.
Positive meant for Figure 14 for example that, a higher dearh rate corresponed with
higher amounts of unemployment, meaning the relationship was positive. The
statistics of the separate correlations are also recorded below in Table 8. Identical to
the previous sub-section, the decimal score represented a percentage.
Figure 14: Premature Death Rate Versus Unemployment
Padraig Quinn
11125900
19
Figure 15: Premature Death Rate Vs UnskilledWorkers
Figure 16: Premature Death Rate Versus UnskilledHouseholds
Padraig Quinn
11125900
20
Figure 17: Premature Death Rate Versus Poor Education
Variables Correlation
Premature DeathVsUnemployment 0.51
Premature DeathVsUnskilledWorkers 0.68
Premature DeathVsUnskilledHouseholds 0.73
Premature DeathVsPoorEducation 0.55
Table 8: Correlation Statistics Python Script
Padraig Quinn
11125900
21
4. Discussionand Conclusion
4.1 Discussion
The results produced from the several analyses suggested that there is a relationship
between higher rates of premature mortality and lower social class factors. The
findings correspond with previous research (Pensola & Martikainen, 2003) and
(Erikson & Torrsander, 2008) that premature mortality tends to increase with lower
social classes. The relationship or correlation was relatively strong and in some cases
more so than others, particularly unskilled households and early mortality and
unskilled workers and early mortality, see table 8.
The clustering analysis provided valuable insight as regards where the high and low
clusters of each variable were situated. The majority of clustering relating to low
social class variables occurred in areas known to be less affluent such as Jobstown in
the South Dublin district and Ballymun in the Dublin City division of the county. This
also corresponded with the high clusters of premature mortality. In the more affluent
Dunlaoighaire-Rathdown constituency, the opposite occurred. These analyses further
bolster previous research relating to mortality and social class (Rigby, et al., n.d.).
Findings are evident in Figure 9 to Figure 13. Furthermore, areas that experienced
high clustering indicated that this was solely a result of the high values of the variable
being analysed, independent of highly populated areas. Highly populated areas were
not a factor for IAs, as they all had roughly the same populations (close to 10,000).
This proved that the IA was in fact an accurate means to examine the individual
regions of the county, limiting other factors that may have affected what was being
modelled in the model.
The evidence generated from the GWR analyses and the python script analyses also
highlight the positive correlations between the socioeconomic variables and early
mortality. They support the observations associated with higher early mortality rates
and poor education (Higgins, et al., 2008). The figures are clearly evident in Table 8.
Also when poor education was combined in pairs with the other variables and
likewise with all of the variables, the relationship was apparent. See Tables 3, 5, 6 and
7. Also they support the reasoning that unemployment and low skilled classes
contribute to early mortality, which in turn may lead to poverty. The evidence
reiterated previous studies referring to poverty as a killer (Murali & Oyebode, 2004).
4.2 Conclusion
The report has provided a valuable insight into the complex study of linking
socioeconomic status and premature mortality. Several issues needed to be addressed
before the analysis could even begin. Issues such as; combining, merging and
dissolving datasets that were of different geographical scale, choosing variables that
contributed most to lower social status and selecting analysis techniques that best
addressed and represented the issue being examined. Several separate analysis
techniques were implemented to reduce as much of the inconsistencies as possible and
to make the modelling outcomes as plausible as possible. These complex techniques
combined, contrasted and compared all the relevant variables against premature
mortality, producing in all cases, a relatively strong positive correlation between
premature mortality and low socioeconomic status.
However, for better results, further datasets could have been included, but were
outside the scope of this study. An interesting dataset that might have produced more
specific results is the deprivation index. The dataset is composed of small area (SA)
data, and may have produced more detailed results when merged and dissolved with
Padraig Quinn
11125900
22
the larger IA Mortality data. The smaller data provides larger variations as a whole as
several more jurisdictions are incorporated into the dataset.
Also, regarding the newly generated dissolved ED to IA layer, the study perhaps
could have normalised the social class variables. The statistical data they contained
related to the SUM or total numbers of people for each variable, inclusive of people
over the age of 75. It would have been better to have established a cut-off point, as to
only include people from within each variable that were under seventy five when
correlating them with the premature mortality rate. It would have made the analysis
slightly more accurate and precise.
Nevertheless, this study has reiterated the fact that lower socioeconomic status does
have a detrimental effect on mortality rates. The research undertaken in this report
was detailed, interrogative and showed how strong the individual relationships were
between the individual variables and the early mortality rate, and also the relationship
between the early mortality rate and combinations of the socioeconomic variables.
This study proved a success and could be used as a valuable asset for decision makers
in the future regarding our fragile health system and futile housing and employment
situation.
Padraig Quinn
11125900
23
Bibliography
Erikson, R. & Torrsander, J., 2008. Social Class and Cause of Death. Wuropean
Journal of Public Health, 18(5), pp. 473-478.
Heywood, I., Cornelius, S. & Carver, S., 2006. An Introduction to Geographical
Information Systems. 3rd ed. Essex: Pearson.
Higgins, C., Lavin, T. & Metcalfe, O., 2008. Health Impacts on Education: A Review,
Belfast: The Institute of Public Health in Ireland.
Legg, R. & Bowe, T., 2009. Applying Geographically Weighted Regression to a Real
Estate Problem, Michigan: Michigan University.
Murali, V. & Oyebode, F., 2004. Poverty, Social Inequality and Mental Health.
Advances in Psychiatric Treatment, 10(3), pp. 216-224.
Pensola, T. & Martikainen, P., 2003. Cumulative Social Class and Mortality from
Various Causes of Adult Men. Journal of Epidemiology and Community Health,
57(9), pp. 745-751.
Rigby, J. E. et al., n.d. Towards A Geography of Health Inequalities in Ireland, s.l.:
Draft.
Padraig Quinn
11125900
24
Appendices
Appendix 1: Python Stats Script
Padraig Quinn
11125900
25
Appendix 2: Python Plotting Graphs Script

More Related Content

Similar to Early Mortality Report

Fill-us-in: Information Asymmetry, Signals and The Role of Updates in Crowdfu...
Fill-us-in: Information Asymmetry, Signals and The Role of Updates in Crowdfu...Fill-us-in: Information Asymmetry, Signals and The Role of Updates in Crowdfu...
Fill-us-in: Information Asymmetry, Signals and The Role of Updates in Crowdfu...CamWebby
 
MSc Finance Dissertation
MSc Finance DissertationMSc Finance Dissertation
MSc Finance DissertationMichael Tack
 
Survey on corruption (the asia foundation april 2015)
Survey on corruption (the asia foundation april 2015)Survey on corruption (the asia foundation april 2015)
Survey on corruption (the asia foundation april 2015)Serod Ichinkhorloo
 
Global Indicators of High Growth Economies
Global Indicators of High Growth EconomiesGlobal Indicators of High Growth Economies
Global Indicators of High Growth EconomiesOghosa Igbinakenzua
 
Technical Background Document NAPL Calc
Technical Background Document NAPL CalcTechnical Background Document NAPL Calc
Technical Background Document NAPL CalcGregory Rucker
 
Survey on perceptions and knowledge of corruption (2016)
Survey on perceptions and knowledge of corruption (2016)Survey on perceptions and knowledge of corruption (2016)
Survey on perceptions and knowledge of corruption (2016)gganbaatar
 
Climate Change and Agriculture into the 21st Century
Climate Change and Agriculture into the 21st CenturyClimate Change and Agriculture into the 21st Century
Climate Change and Agriculture into the 21st CenturyTurlough Guerin GAICD FGIA
 
Legal Doc
Legal DocLegal Doc
Legal Doclegal6
 
Legal Doc
Legal DocLegal Doc
Legal Doclegal5
 
Lega Doc
Lega DocLega Doc
Lega Doclegal3
 
Lega Doc
Lega DocLega Doc
Lega Doclegal5
 
Lega Doc
Lega DocLega Doc
Lega Doclegal4
 
Face recognition vendor test 2002 supplemental report
Face recognition vendor test 2002   supplemental reportFace recognition vendor test 2002   supplemental report
Face recognition vendor test 2002 supplemental reportSungkwan Park
 
Rand rr2504z1.appendixes
Rand rr2504z1.appendixesRand rr2504z1.appendixes
Rand rr2504z1.appendixesBookStoreLib
 
Downward Nominal and Real Wage Rigidity in the Netherlands (MSc. Thesis)
Downward Nominal and Real Wage Rigidity in the Netherlands (MSc. Thesis)Downward Nominal and Real Wage Rigidity in the Netherlands (MSc. Thesis)
Downward Nominal and Real Wage Rigidity in the Netherlands (MSc. Thesis)Wouter Verbeek
 
Social Vulnerability Assessment Tools for Climate Change and DRR Programming
Social Vulnerability Assessment Tools for Climate Change and DRR ProgrammingSocial Vulnerability Assessment Tools for Climate Change and DRR Programming
Social Vulnerability Assessment Tools for Climate Change and DRR ProgrammingUNDP Climate
 

Similar to Early Mortality Report (20)

Fill-us-in: Information Asymmetry, Signals and The Role of Updates in Crowdfu...
Fill-us-in: Information Asymmetry, Signals and The Role of Updates in Crowdfu...Fill-us-in: Information Asymmetry, Signals and The Role of Updates in Crowdfu...
Fill-us-in: Information Asymmetry, Signals and The Role of Updates in Crowdfu...
 
MSc Finance Dissertation
MSc Finance DissertationMSc Finance Dissertation
MSc Finance Dissertation
 
Survey on corruption (the asia foundation april 2015)
Survey on corruption (the asia foundation april 2015)Survey on corruption (the asia foundation april 2015)
Survey on corruption (the asia foundation april 2015)
 
Global Indicators of High Growth Economies
Global Indicators of High Growth EconomiesGlobal Indicators of High Growth Economies
Global Indicators of High Growth Economies
 
Technical Background Document NAPL Calc
Technical Background Document NAPL CalcTechnical Background Document NAPL Calc
Technical Background Document NAPL Calc
 
EC331_a2
EC331_a2EC331_a2
EC331_a2
 
Survey on perceptions and knowledge of corruption (2016)
Survey on perceptions and knowledge of corruption (2016)Survey on perceptions and knowledge of corruption (2016)
Survey on perceptions and knowledge of corruption (2016)
 
Climate Change and Agriculture into the 21st Century
Climate Change and Agriculture into the 21st CenturyClimate Change and Agriculture into the 21st Century
Climate Change and Agriculture into the 21st Century
 
Legal Doc
Legal DocLegal Doc
Legal Doc
 
Legal Doc
Legal DocLegal Doc
Legal Doc
 
Lega Doc
Lega DocLega Doc
Lega Doc
 
Lega Doc
Lega DocLega Doc
Lega Doc
 
Lega Doc
Lega DocLega Doc
Lega Doc
 
Master Thesis
Master ThesisMaster Thesis
Master Thesis
 
Face recognition vendor test 2002 supplemental report
Face recognition vendor test 2002   supplemental reportFace recognition vendor test 2002   supplemental report
Face recognition vendor test 2002 supplemental report
 
Rand rr2504z1.appendixes
Rand rr2504z1.appendixesRand rr2504z1.appendixes
Rand rr2504z1.appendixes
 
Downward Nominal and Real Wage Rigidity in the Netherlands (MSc. Thesis)
Downward Nominal and Real Wage Rigidity in the Netherlands (MSc. Thesis)Downward Nominal and Real Wage Rigidity in the Netherlands (MSc. Thesis)
Downward Nominal and Real Wage Rigidity in the Netherlands (MSc. Thesis)
 
Case sas 2
Case sas 2Case sas 2
Case sas 2
 
Qt7355g8v8
Qt7355g8v8Qt7355g8v8
Qt7355g8v8
 
Social Vulnerability Assessment Tools for Climate Change and DRR Programming
Social Vulnerability Assessment Tools for Climate Change and DRR ProgrammingSocial Vulnerability Assessment Tools for Climate Change and DRR Programming
Social Vulnerability Assessment Tools for Climate Change and DRR Programming
 

Early Mortality Report

  • 1. M a y n o o t h U n iv e r s it y 2016 The Relationship Between Early Mortality and Socioeconomic Status A Statistical Analysis Using ArcGis Padraig Quinn 11125900
  • 2. Padraig Quinn 11125900 1 ExecutiveSummary Previous studies have shown that higher premature mortality rates are associated with lower socioeconomic status. This report examined the correlation and relationships between factors that are considered to contribute to lower social status, together with the Standardised Mortality Rate for premature death, relating to the county of Dublin. The study applied various statistical analysis techniques through the use of GIS software. The techniques included spatial clustering processes, geographically weighted regression and python script analysis. The investigation determined that there is a relatively strong positive relationship between lower socioeconomic status and premature mortality. The areas that had high spatial clusters of lower social classes also had higher rates of early mortality. When the social class variables were tested independently against the premature death rate, they yielded results with varied degrees of strength. The results highlight the need for social reform and significant improvements from political governing bodies.
  • 3. Padraig Quinn 11125900 2 EXECUTIVE SUMMARY ............................................................................................................................................1 TABLE OF FIGURES ...................................................................................................................................................3 TABLE OF TABLES......................................................................................................................................................4 1. INTRODUCTION AND CONTEXT ........................................................................................................................5 1.1 Introduction ...........................................................................................................................................5 1.2 Context...................................................................................................................................................5 1.3 Datasets..................................................................................................................................................5 1.4 Aims and Objectives...............................................................................................................................6 2. METHODOLOGY ...................................................................................................................................................7 2.1 Extract Datasets .....................................................................................................................................7 2.2 Extracting the Layers..............................................................................................................................7 2.3 Encoding Attribute Data.........................................................................................................................8 2.4 Dissolve EDs into IAs ..............................................................................................................................9 2.5 Spatial Join ...........................................................................................................................................10 2.6 Cluster Analysis....................................................................................................................................11 2.7 GeographicallyWeighted Regression..................................................................................................12 2.8 Python Script for Identifying Correlations ...........................................................................................12 3. RESULTS ..............................................................................................................................................................13 3.1 Cluster Analyses...................................................................................................................................13 3.2 GeographicallyWeighted Regression..................................................................................................15 3.3 Python Script Analysis..........................................................................................................................18 4. DISCUSSION AND CONCLUSION ....................................................................................................................21 4.1 Discussion.............................................................................................................................................21 4.2 Conclusion............................................................................................................................................21 BIBLIOGRAPHY .......................................................................................................................................................23 APPENDICES............................................................................................................................................................24
  • 4. Padraig Quinn 11125900 3 Table of Figures FIGURE 1:SELECTBY ATTRIBUTES SQL............................................................................................................................8 FIGURE 2: IASUPERIMPOSED ON ED FIGURE 3: IDENTIFY OPTION.............................................................................9 FIGURE 4:POPULATING ED ATTRIBUTE TABLEWITH IACODES IN EDIT MODE ..................................................................9 FIGURE 5:DISSOLVE TOOL...........................................................................................................................................10 FIGURE 6:SPATIAL JOIN...............................................................................................................................................11 FIGURE 7:CLUSTER ANALYSIS ......................................................................................................................................11 FIGURE 8:GEOGRAPHICALLY WEIGHTED REGRESSION TOOL .........................................................................................12 FIGURE 9:PREMATURE SMR CLUSTERING...................................................................................................................13 FIGURE 10:UNEMPLOYMENT CLUSTERING...................................................................................................................14 FIGURE 11:UNSKILLED WORKER CLUSTERING..............................................................................................................14 FIGURE 12:UNSKILLED HOUSEHOLDS CLUSTERING.......................................................................................................15 FIGURE 13:POOR EDUCATION CLUSTERING.................................................................................................................15 FIGURE 14:PREMATURE DEATH RATE VERSUS UNEMPLOYMENT..................................................................................18 FIGURE 15:PREMATURE DEATH RATE VS UNSKILLED WORKERS...................................................................................19 FIGURE 16:PREMATURE DEATH RATE VERSUS UNSKILLED HOUSEHOLDS......................................................................19 FIGURE 17:PREMATURE DEATH RATE VERSUS POOR EDUCATION ................................................................................20
  • 5. Padraig Quinn 11125900 4 Table of Tables TABLE 1: GWR UNEMPLOYMENTAND UNSKILLED WORKERS.......................................................................................16 TABLE 2: UNEMPLOYMENTAND UNSKILLED HOUSEHOLDS............................................................................................16 TABLE 3: UNEMPLOYMENTAND POOR EDUCATION ......................................................................................................16 TABLE 4: UNSKILLED WORKERS AND UNSKILLED HOUSEHOLDS......................................................................................17 TABLE 5: UNSKILLED WORKERS AND POOR EDUCATION................................................................................................17 TABLE 6: UNSKILLED HOUSEHOLDS AND POOR EDUCATION...........................................................................................17 TABLE 7: GWR ALL VARIABLES....................................................................................................................................17 TABLE 8:CORRELATION STATISTICS PYTHON SCRIPT .....................................................................................................20
  • 6. Padraig Quinn 11125900 5 1. Introductionand Context 1.1 Introduction Early mortality rates are often associated with health conditions due to unhealthy lifestyle choices such as smoking, alcohol intake, drug use or diet. Although these suggestions may be true, socioeconomic status is also a recognised factor that influences premature mortality (Erikson & Torrsander, 2008). It is perceived that people from lower social classes tend to have less disposable income, therefore a lower standard of living, resulting in earlier mortality rates. The opposite is associated with higher social classes (Pensola & Martikainen, 2003). Furthermore, a lower social class generally coincides with poorer education standards and higher levels of unemployment, which can also have a detrimental effect on a person’s health. Research has suggested that people with lower levels of education are more probable to die at an earlier age and are also likely to suffer from inferior levels of health throughout the course of their life (Higgins, et al., 2008). This report begins with an introduction and context section outlining the topic, followed by a detailed methodology section divided into subheadings and sections that explain each fundamental step, supported with screen shots, graphs and tables for visual representation and explanatory purposes. Furthermore, this section will be followed by a results section detailing the outcome of each stage of the analysis. The final segment will comprise of a discussion and conclusion section that further explains and summarises the findings in the context of this report. All of the necessary final maps are attached as separate appendices at the end of this report in PDF format. 1.2 Context A detailed understanding of the relationship between social class and health is of increasing importance (Higgins, et al., 2008). Many governmental policies claim to want to eradicate social inequalities, however they are still ever present, and in particular within the health system. Poverty is referred to as a ‘ruthless killer’ and a variable that harmfully influences health (Murali & Oyebode, 2004). Researching and understanding such relationships and inequalities can be time consuming and often expensive due to multiple factors that need to be taken into consideration. However, due to advancements in research techniques and in particular technology, research can be completed more efficiently, significantly reducing cost and time taken to collate all the necessary data. Through the combination of readily available statistical datasets and the software package Arcgis, the information can be interrogated and manipulated to produce visual representation of statistical data for informative and explanatory purposes. Any relationships or correlations can be easily identified and understood if present. 1.3 Datasets The datasets utilised for this report were a combination of census data obtained from the Central Statistics Office (CSO) and mortality data received from the Geography Department at Maynooth University. The census data contains Ireland LA (local authority) data and Electoral District (ED) data. The LA dataset was primarily for county extraction together with visual representation. The ED dataset was used for extracting socioeconomic data within the county of Dublin. The mortality data contains premature death rates at intermediate (IA) level and was used to compare death rates with selected social classes for the County Dublin area. Rigby, et al., n.d.
  • 7. Padraig Quinn 11125900 6 states that data at county level is too large and EDs are too small to accurately portray spatial variations regarding health data, so the IA scale was utilised. 1.4 Aims and Objectives This report aimed to identify some of the socio economic factors associated with premature mortality in the county of Dublin and discuss the correlations and relationships if any, between certain socioeconomic factors and how they may, or may not influence early mortality rates. The social factors were; unemployment, unskilled workers, unskilled households and lower level education. People with an education level of early secondary school or less were judged to be within this category. The mortality data concerned the standardised mortality rate (SMR). The report is primarily based on statistical analysis of the datasets within Arcgis using various tools such as, clustering algorithms and regression analysis correlations. Additionally, two Python computer coding scripts were incorporated into the analysis of the data. These processes are explained in detail in the upcoming methodology section. The objective was to display these analyses geographically, in tabular format and by means of graphical representation for illustrative purposes, in order to readily identify any correlations and/or relationships between premature mortality and social class in County Dublin.
  • 8. Padraig Quinn 11125900 7 2. Methodology 2.1 Extract Datasets Firstly, all the required datasets as previously mentioned needed to be extracted from zip files and added to the table of contents (TOC). Then, Arc Catalogue was opened and all the necessary datasets were modified to a specific coordinate system. In this case the datasets were given the Irish National Grid (TM65) projected coordinate system. This was a very important process as it gave the map actual geographic coordinates and was accurate in relation to scale and location which was essential when applying measured spatial queries. From within Arc Catalogue, a blank map was activated by selecting the globe icon (Arc Map). Arc Catalogue is better for managing and organising multiple datasets and files. The data frame also needed to be assigned the Irish National Grid coordinate system. This was achieved by right clicking layers at the top of the TOC, select properties, coordinate systems, national grids, Europe and finally Irish national grid. Once the datasets were extracted, they were added to TOC by simply selecting them from the Arc Catalogue window and dragging them across to the table of contents (TOC). 2.2 Extracting the Layers The Ireland LA, ED and Mortality layers were all at national scale. However, for the purpose of this report, only information regarding the county of Dublin was required. In order to accomplish this, certain procedures were applied within the software. The ‘Select by Attributes’ function was employed to extract the information from each layer concerning exclusively the county of Dublin. This process was vital as the data solely contained within County Dublin could be further interrogated and manipulated independently from the rest of the country. Within the ‘Select by Attributes’ option, a ‘Standard Query language’ (SQL) function needed to be implemented. For the purpose of this exercise, the mortality data served as an example for instructive purposes represented in Figure 1. The layer chosen was the Mortality Layer, then County from the first list, followed by the SQL function highlighted in red in Figure 1. The ‘OR’ function was chosen to extract, as the areas of interest were a combination of four separate subdivisions within the county. Care needed to be taken as the ‘AND’ result would have produced a different outcome, selecting only areas where the four sub-divisions intersected each other. The same steps were carried out for the remaining layers, except the layer name was changed each time. Finally each of the new temporary extracted layers were exported as shapefiles, so that they could be queried independently from the original datasets.
  • 9. Padraig Quinn 11125900 8 Figure 1: Select by Attributes SQL 2.3 Encoding Attribute Data One of the bigger problems this report encountered was that the ED data and the Mortality data were at different geographical spatial scales. The ED areas were smaller than the Mortality (IA) areas. The ED areas had too low a population is some cases for accurate mortality data, producing too few deaths, so the IA was developed. The Mortality areas were equally populated areas with populations of close to 10,000 per region (Rigby, et al., n.d.). Without some form of editing, it would not have been possible to combine and interrogate the separate datasets as one. In order to rectify this problem, a new field was added to the ED layer called Region. The ED layer had 322 individual areas and the Mortality layer had just over 111. The Mortality layer was made transparent and its outline thickness was set to two. See Figure 2. It was then superimposed on top of the ED layer. This allowed the operator to visualise the EDs that were contained within the intermediate areas (IAs). Also the identify option was chosen from the task bar at the top of the screen. Within the identify window, the Mortality layer was selected in the ‘Identify From’ option as evident in Figure 3. This allowed the user to click on an ED and the information of the IA would appear in the window. From within this list, the IA_ID was used as the area code in the newly added field referred to as Region in the ED layer. The IA_ID was chosen, as it was possible for an ED that was situated within an IA to use the same code to identify both the ED and IA once amalgamated. Therefore, this code was used to match the EDs to the IAs in the modified output table.
  • 10. Padraig Quinn 11125900 9 Figure 2: IA Superimposedon ED Figure 3: Identify Option Once the new field was added, it needed to be encoded with the relevant IA code for each ED. To accomplish this, the editor toolbar was activated followed by the edit attributes option. Then by clicking on the row in the ED attribute table, it opened a window with all the attributes for that layer including the newly created Region field. By clicking on the ED using the identify option, the operator could see the IA_ID for the ED. Then the IA_ID code was simply keyed into the Region field from within the attribute editor window. See Figure 4 below. When finished the Stop Editing option was chosen and the edits were saved. Figure 4: Populating Ed Attribute Table with IA Codes in Edit Mode 2.4 Dissolve EDs into IAs After successful completion of the encoding (matching) procedure, the EDs could then be amalgamated into the relevant IAs using a Geoprocessing tool known as Dissolve. The dissolve feature can be referred to as a reclassification of vector data so that polygons can be arranged into higher forms (Heywood, et al., 2006). Once the
  • 11. Padraig Quinn 11125900 10 tool was selected, it had to be modified in order to successfully attach and merge the smaller scale ED data into the larger scale IA data. In the input feature, the ED layer was chosen and the dissolved field was the newly created Region field. In the statistics field, the required variables discussed in the previous section were selected. The statistic type selected was SUM, as the total numbers for the each of the combined EDs within the IAs were required. This gave a total count for each variable calculated from the combined number of EDs within the IA. Figure 5 depicts the Dissolve tool. The output file was saved as Diss_Ed_IA. Figure 5: Dissolve Tool 2.5 Spatial Join The next step was to apply a spatial join using the newly created dissolved layer and the mortality layer. Up to this point the data was at the same scale geographically, but they were still two separate datasets. The two datasets were joined on the basis of their attribute table values, as the Region field in the dissolved layer was a match for the IA_ID field in the mortality layer. The dissolved layer (Diss_Ed_IA) was selected, and by right clicking on it in the TOC and selecting the ‘Join Data’ option from within the ‘Joins and Relates’ selection, opened the ‘Join Data’ window. The join was configured exactly as depicted below. The important aspect to note was that the join was based on the Region field in the dissolved layer and the IA_ID field in the Mortality layer. The result was a single attribute table containing both datasets at the same geographic scale. They could then be queried and manipulated as a single complete dataset. The process is displayed in Figure 6 below.
  • 12. Padraig Quinn 11125900 11 Figure 6: Spatial Join 2.6 Cluster Analysis Cluster and Outlier analyses, referred to as Anselin Local Morans I, were carried out on all of the mentioned variables (SMR, unemployment, unskilled workers, unskilled households and poor education). These analyses were administered in order to visualise any areas of high or low clustering of the chosen variables and also to identify any outliers. This technique also assisted in determining any patterns or similarities of clusters between the selected variables. The cluster analysis tool was implemented from within the spatial statistics tool box. As with other tool functions, it needed to be modified in the set up window. The input feature utilised was the amalgamated or dissolved layer and for the purpose of this description, the SMR variable was chosen as the input variable for the analysis and is identifiable below. As expected, the same measures were repeated for the remaining variables. Figure 7: ClusterAnalysis
  • 13. Padraig Quinn 11125900 12 2.7 Geographically Weighted Regression Regression analysis is used to examine, explore and predict spatial relationships. Geographically Weighted Regression (GWR) is a powerful method used for examining and estimating linear relationships. It provides a local model of the variable that is being examined or predicted by fitting a regression equation to each feature in the dataset. It is a sophisticated basis to quantify and dissect spatial patterns across a study area (Legg & Bowe, 2009). For the purpose of this report GWR was enforced to analyse the hypothesis of higher premature death rates in lower social classes and the strength of these relationships. Again, this feature was available in the spatial analyst toolbox. This feature provided detailed and flexible analysis of the data. The input feature used was the recent dissolved and amalgamated layer and the dependant feature was the SMR under 75 variable. Additionally, explanatory variables were added, firstly in pairs, then as a combination of all the variables. The regression model used in this explanatory example was the SMR as the dependant variable and unemployment and unskilled workers as the explanatory variables. See Figure 8 for visual representation. Figure 8: Geographically Weighted Regression Tool 2.8 Python Script for Identifying Correlations A python script was employed to further investigate correlations between the variables and death rates. Similar to the regression analysis, it was applied to examine the relationship between different social factors and premature death rates. However, in this case the python script observed the direct relationship between a single variable and the SMR and also the strength of that relationship or correlation. It was also used to create scatter plot graphs with lines of best suit, which better identify any correlation or variance within the model. Two scripts were implemented. The first script created a table from the statistics and the second script created graphs from the table. Both are attached in the appendices section of this report.
  • 14. Padraig Quinn 11125900 13 3. Results 3.1 Cluster Analyses The Clustering Analysis (Local Morans I Statistic) was used to essentially measure z values (similarity of clustering) and p values (spatial significance). This function allowed the user to identify areas of high and low clustering of a variable, but also areas that were surrounded by contradictory values, either high or low. The grey areas represented areas that did not contain statistically significant clustering. The black areas (High-High or HH) indicated areas of high values clustered closely together. The blue areas (Low-Low or LL) were the opposite of the black areas, as they yielded low values clustered closely together. The orange areas (High-Low or HL) were the opposite of clustering, referred to as outliers. They were a result of high values surrounded by low values. However, they were non-applicable for the outcome of this study. The white areas (Low-High or LH) were a result of low values surrounded by high values. The SMR under 75 (Premature Mortality) clustering analysis produced greatest levels of high score (HH) clustering in areas in the Dublin City and South Dublin regions of the county. Also, it generated statistically significant clusters of low frequency SMR under 75 (LL) in the Dun Laoighaire-Rathdown region of the county. The LH outlier areas were predominantly situated in the Fingal jurisdiction of the county situated to the west, illustrating pockets of low mortality rates surrounded by higher rates. The results suggested that more affluent LH areas such as Castleknock, had longer life expectancy than some of the more deprived surrounding areas in Blanchardstown for example. Results are evident in Figure 9. Figure 9: Premature SMR Clustering The unemployment analysis generated the greatest levels of high score clustering (HH) of unemployment in the South Dublin and Fingal districts of the county (with the exception of one Dublin City IA) and clustering of low levels of unemployment in the wealthier Dunlaoighaire-Rathdown territory. The remainder of the county was deemed to have no significant clustering. The results are displayed in figure 10.
  • 15. Padraig Quinn 11125900 14 Figure 10: Unemployment Clustering The unskilled worker analysis created the greatest amount of high level-high value (HH) unskilled workers clustered within the South Dublin, Dublin City and Fingal areas. There were higher intensity clusters of people who were not considered unskilled in the Dunlaoighaire-Rathdown locality and also a pocket of low value clusters in the Fingal region, namely Howth. Furthermore, there was also an outlier (LH) area indicating a vicinity of skilled workers close to, or surrounded by unskilled workers in the Fingal quarter, situated on the western fringe of the county boundary. See figure 11. Figure 11: UnskilledWorkerClustering The unskilled households’ results fashioned similar outcome. However, this time there was more HH clustering in the Dublin City jurisdiction with LH outliers in pockets around the territory also. For this analysis, Howth was not considered significant. These can be identified in Figure 12.
  • 16. Padraig Quinn 11125900 15 Figure 12: UnskilledHouseholds Clustering The Poor Education clustering formed a similar end result, with the HH clusters in the Western South Dublin and Fingal zones, and LL clustering in the Dunlaoighaire- Rathdown constituency. All the results appeared to more negative outcomes for known disadvantaged areas and more positive results for known prosperous regions. Figure 13: Poor Education Clustering 3.2 Geographically Weighted Regression The geographically weighted regression was calculated by generating regression models using the premature SMR statistics as the dependant variable a combination of pairs of the selected lower social class variables as the explanatory variables. Also a combined model was created including all of the low social class variables versus the premature SMR variable. For the scope of this report, the adjusted R2 value was of interest to the user, as this was the factor that determined how strong or weak the relationship was between the selected variables and the premature death rate was. It is referred to as the likeness or goodness of fit and it varies between 0.0 and 1, with
  • 17. Padraig Quinn 11125900 16 higher values being favoured (Legg & Bowe, 2009). The results are illustrated in the tables below. Table 1 had an adjusted R2 value of just over 0.51. This meant that the model could explain just over 51 percent of variation in the model, or the two lower social class variables (Unemployment and Unskilled Workers) accounted for 51 percent of the variation. This meant that there was evidence of a relatively strong relationship or correlation between the two variables and premature mortality. Table 3 had the weakest result and Table 6 the strongest of the pairings. Also, when all of the variables were combined, the results showed that there was a relatively strong relationship between all of the lower social class variables and premature mortality. The result had an adjusted R2 value of slightly over 0.55. See Table 7. Table 1: GWR Unemployment and UnskilledWorkers Table 2: Unemployment and Unskilled Households Table 3: Unemployment and Poor Education
  • 18. Padraig Quinn 11125900 17 Table 4: UnskilledWorkers and UnskilledHouseholds Table 5: UnskilledWorkers and Poor Education Table 6: UnskilledHouseholds andPoor Education Table 7: GWR All Variables
  • 19. Padraig Quinn 11125900 18 3.3 Python Script Analysis The python script analysis was also implemented to examine the correlation between premature mortality rates and selected lower social class determining variables. However, this time only one variable was selected per model with the premature mortality rate. It produced detailed easy to interpret graphs with a visible line of best fit. The results depicted a relatively strong relationship between the variables and the premature mortality rate. See Figures 14 to 17 below. The points were scattered close to the line of best fit suggesting that there was indeed a relationship. The closer and more dense the scattering to the line, the stronger the correlation. Additionally, as the premature mortality rate increased, so too did the quantity of the variable being examined, illustrating that there was a positive relationship between the variables. Positive meant for Figure 14 for example that, a higher dearh rate corresponed with higher amounts of unemployment, meaning the relationship was positive. The statistics of the separate correlations are also recorded below in Table 8. Identical to the previous sub-section, the decimal score represented a percentage. Figure 14: Premature Death Rate Versus Unemployment
  • 20. Padraig Quinn 11125900 19 Figure 15: Premature Death Rate Vs UnskilledWorkers Figure 16: Premature Death Rate Versus UnskilledHouseholds
  • 21. Padraig Quinn 11125900 20 Figure 17: Premature Death Rate Versus Poor Education Variables Correlation Premature DeathVsUnemployment 0.51 Premature DeathVsUnskilledWorkers 0.68 Premature DeathVsUnskilledHouseholds 0.73 Premature DeathVsPoorEducation 0.55 Table 8: Correlation Statistics Python Script
  • 22. Padraig Quinn 11125900 21 4. Discussionand Conclusion 4.1 Discussion The results produced from the several analyses suggested that there is a relationship between higher rates of premature mortality and lower social class factors. The findings correspond with previous research (Pensola & Martikainen, 2003) and (Erikson & Torrsander, 2008) that premature mortality tends to increase with lower social classes. The relationship or correlation was relatively strong and in some cases more so than others, particularly unskilled households and early mortality and unskilled workers and early mortality, see table 8. The clustering analysis provided valuable insight as regards where the high and low clusters of each variable were situated. The majority of clustering relating to low social class variables occurred in areas known to be less affluent such as Jobstown in the South Dublin district and Ballymun in the Dublin City division of the county. This also corresponded with the high clusters of premature mortality. In the more affluent Dunlaoighaire-Rathdown constituency, the opposite occurred. These analyses further bolster previous research relating to mortality and social class (Rigby, et al., n.d.). Findings are evident in Figure 9 to Figure 13. Furthermore, areas that experienced high clustering indicated that this was solely a result of the high values of the variable being analysed, independent of highly populated areas. Highly populated areas were not a factor for IAs, as they all had roughly the same populations (close to 10,000). This proved that the IA was in fact an accurate means to examine the individual regions of the county, limiting other factors that may have affected what was being modelled in the model. The evidence generated from the GWR analyses and the python script analyses also highlight the positive correlations between the socioeconomic variables and early mortality. They support the observations associated with higher early mortality rates and poor education (Higgins, et al., 2008). The figures are clearly evident in Table 8. Also when poor education was combined in pairs with the other variables and likewise with all of the variables, the relationship was apparent. See Tables 3, 5, 6 and 7. Also they support the reasoning that unemployment and low skilled classes contribute to early mortality, which in turn may lead to poverty. The evidence reiterated previous studies referring to poverty as a killer (Murali & Oyebode, 2004). 4.2 Conclusion The report has provided a valuable insight into the complex study of linking socioeconomic status and premature mortality. Several issues needed to be addressed before the analysis could even begin. Issues such as; combining, merging and dissolving datasets that were of different geographical scale, choosing variables that contributed most to lower social status and selecting analysis techniques that best addressed and represented the issue being examined. Several separate analysis techniques were implemented to reduce as much of the inconsistencies as possible and to make the modelling outcomes as plausible as possible. These complex techniques combined, contrasted and compared all the relevant variables against premature mortality, producing in all cases, a relatively strong positive correlation between premature mortality and low socioeconomic status. However, for better results, further datasets could have been included, but were outside the scope of this study. An interesting dataset that might have produced more specific results is the deprivation index. The dataset is composed of small area (SA) data, and may have produced more detailed results when merged and dissolved with
  • 23. Padraig Quinn 11125900 22 the larger IA Mortality data. The smaller data provides larger variations as a whole as several more jurisdictions are incorporated into the dataset. Also, regarding the newly generated dissolved ED to IA layer, the study perhaps could have normalised the social class variables. The statistical data they contained related to the SUM or total numbers of people for each variable, inclusive of people over the age of 75. It would have been better to have established a cut-off point, as to only include people from within each variable that were under seventy five when correlating them with the premature mortality rate. It would have made the analysis slightly more accurate and precise. Nevertheless, this study has reiterated the fact that lower socioeconomic status does have a detrimental effect on mortality rates. The research undertaken in this report was detailed, interrogative and showed how strong the individual relationships were between the individual variables and the early mortality rate, and also the relationship between the early mortality rate and combinations of the socioeconomic variables. This study proved a success and could be used as a valuable asset for decision makers in the future regarding our fragile health system and futile housing and employment situation.
  • 24. Padraig Quinn 11125900 23 Bibliography Erikson, R. & Torrsander, J., 2008. Social Class and Cause of Death. Wuropean Journal of Public Health, 18(5), pp. 473-478. Heywood, I., Cornelius, S. & Carver, S., 2006. An Introduction to Geographical Information Systems. 3rd ed. Essex: Pearson. Higgins, C., Lavin, T. & Metcalfe, O., 2008. Health Impacts on Education: A Review, Belfast: The Institute of Public Health in Ireland. Legg, R. & Bowe, T., 2009. Applying Geographically Weighted Regression to a Real Estate Problem, Michigan: Michigan University. Murali, V. & Oyebode, F., 2004. Poverty, Social Inequality and Mental Health. Advances in Psychiatric Treatment, 10(3), pp. 216-224. Pensola, T. & Martikainen, P., 2003. Cumulative Social Class and Mortality from Various Causes of Adult Men. Journal of Epidemiology and Community Health, 57(9), pp. 745-751. Rigby, J. E. et al., n.d. Towards A Geography of Health Inequalities in Ireland, s.l.: Draft.
  • 26. Padraig Quinn 11125900 25 Appendix 2: Python Plotting Graphs Script