1. To: Mike Ryan, Deputy Secretary for Highways
From: Berkeley Teate, Analyst
RE: Patchwork Assessment 2016
Introduction: The Pennsylvania State Department of Highways has requested an examination of
the effect of using specialized crews on the cost of conducting one particular maintenance activity.
Specific interest in this analysis will focus on the maintenance activity titled ‘manual patching,’
otherwise known as filling in potholes in state highways. Maintenance managers in the Department
believe the increasing reliance on specialized crews for activities will improve efficiency [of
labor], and result in lower costs of manual patching. Lastly, the Department has implemented a
quality improvement initiative in the past two years, and is interested to learn whether increased
quality [of manual patching] leads to reduction in costs. I intend to analyze these variables, to
determine the extent to which specialized crews and the quality improvement initiative have
impacted costs for the Department of Highways in Pennsylvania.
ResearchHypothesis: An overall increase in ‘Crew Specalization’ should result in lower
costs [of manual patching work, in dollars] for the 67 county maintenance units examined.
o The reliance on specialized crews and the quality improvement initiative will
decrease the influence of specific variables [costs of manual patching work in
dollars, quality assurance score for manual patching, and production units per 100
activity hours] on the cost of manual patching.
o The same reliance(s) will increase the influence of specific variables [production
units of manual patching completed, lane miles of state highway, number of
freeze/thaw cycles, average days with snow on the ground, and material cost index]
on the cost of manual patching.
Research Method: An analysis of 67 county maintenance units in the Pennsylvania Department
of Highways was completed, looking at descriptive data provided. The independent variables
examined in this analysis are as follows: 1) Crew specialization (interval scale); 2) Production
units of manual patching completed (interval scale); 3) Quality assurance score for manual
patching (ordinal scale); 4) Production units per 100 activity hours (interval scale); 5) Lane Miles
of state highway (nominal scale); 6) Number of freeze/thaw cycles (interval scale); 7) Average
snow days on the ground (interval scale); and, 8) Materials Cost Index (interval scale). Operating
efficiency will also considered an independent variable, along with crew specialization and labor
productivity variables. The dependent variable in this analysis is the cost of manual patching work,
in dollars (interval scale), of each of the 67 county maintenance units. The control variables
include: 1) production units [when analyzing the association between crew specialization and
cost]; 2) material cost index [when analyzing the effect of crew specialization on overall
expenditures]; and, 3) the number of freeze/thaw cycles and the average days of snow on the
ground [considering climatic conditions in the State of Pennsylvania].
The quality assurance variable provides scores ranging from 1 – 10, with higher scores
reflecting more positive patching jobs.
The crew specialization variable is an indicator of the extent to which manual patching
work is performed by specialized crews. Values on this scale represent the percent of all
maintenance crews in the district not required to perform the “first” 50% of manual
patching in that district – thus, higher percentages indicate more crew specialization.
2. Now let’s look at specific research procedures. I will use a Cross Sectional Design to view simple
descriptive data, Scatterplots to complete a partial correlation, analyzing the relationship between
individual variables [while controlling for one or more other variable], a Correlation Matrix as a
prelude to my multivariate analysis of contextual variables, where I’ll re-purpose SPSS data to
visualize regression model(s). This will provide an interpretation of influence on each variable in
terms of slope, statistical significance, and the strength of the overall model. NoSampling Strategy
is applicable to this analysis, as there was no strict selection or particular treatment involved.
Although not requested, I’m providing the research design of this cost function analysis –
a Quasi-Experimental, Interrupted Time Series Design. The Department implemented a
quality improvement initiative in the past two years, creating a ‘pre’ and ‘post’ atmosphere
for the variables I’m examining. Please see below for visual aid:
Interrupted Time Series Design:
O O O X O O O
3. Results:
1. Initial Impression of Raw Data [based on Descriptive Statistics]:
Table A:
Descriptive Statistics
N Minimum Maximum Mean Std. Deviation
COST OF MANUAL
PATCHING
66 $37,936 $929,733 $272,421.67 $182,290.557
PRODUCTION UNITS OF
MANUAL PATCHING
66 544 24,256 5,685.50 4,564.538
NUMBER OF FREEZE
THAW CYCLES
67 4 83 24.27 27.030
QUALITY OF MANUAL
PATCHING
67 1 10 5.91 2.360
MATERIAL COST INDEX
MANUAL PATCHING
67 75 150 103.09 14.665
NUMBER OF DAYS SNOW
ON GRND
67 10 91 44.88 20.751
COUNTY LANE MILES 67 161 2701 1103.34 516.889
UNIT COST OF MANUAL
PATCHING
66 $18.16 $186.96 $56.7203 $27.42233
PERCENT CREW
SPECIALIZATION (FIRST
50%)
66 65 95 79.42 6.884
LABOR PRODUCTIVITY
INDEX
66 4.1 18.6 15.022 2.5740
Valid N (listwise) 65
I ran descriptive statistics, displayed above in a cross-sectional table, to retrieve the minimum,
maximum, mean, and standard deviation of all variables provided [independent, dependent,
control]. These results provide a simple understanding of what is presented, and hopefully begin
to show correlation between variables.
The minimum cost of manual patching is $37,936, and the maximum is $929,733 – a near
$900,000 gap. However, when looking at the mean cost for manual patching, it sits at
$272,421, with a standard deviation of $182,290. This provides a high probability for
outliers, which we can look for in the Scatterplot Analysis [in Step #2].
The minimum quality assurance rating of manual patching is 1, and the maximum is 10.
Once again however, the mean shows just below a ‘6’ average. This variable does have a
high Standard Deviation given the small range – nearly a 2.4 rating. This could mean that
quality varies greatly depending on the crew, and possibly the district of the job.
4. The minimum crew specialization percentage is 65%, and the maximum is 95% - nearly a
30% gap in manual patch requirement(s). However the mean is nearly 80%, showing a high
percentage of crews who are not required to perform the first 50% of work.
2. Scatterplot Analysis[Two Separate Comparisons]:
Graph A:
Next, I generated a Scatterplot Analysis to evaluate whether there was a correlation between the
cost of manual patching [dependent variable] and crew specialization [independent variable],
controlling for Material Cost Index and Production Units [The total cost of any manual patching
activity is heavily dependent on the volume of patching completed]. This is a linear relationship,
as defined by the slope equation:
Y = aX + b
The scatterplot above shows a positive, linear association:
5. The strength of the scatterplot is related to how tight the points are – in this analysis, we
see a ‘moderate’ correlation.
The association is determined by the R^2 of the Fit Line. The ‘Fit Line’ is also known as
the regression line. The slope of the regression line = 0.104.
The direction of the scatterplot provides a negative gradient. This is most likely due to
multiple outliers in the upper-left corner of the graph, with high costs and low crew
percentages.
This scatterplot supports the hypothesis as it shows that as crew specialization goes up, the cost of
manual patching decreases. This is reconfirmed by the ‘moderate’ strength in correlation of the
clustered points, and a positive slope.
Graph B:
I also generated a Scatterplot Analysis to evaluate whether there was a correlation between the
cost of manual patching [dependent variable] and the quality of manual patching [independent
variable]. This is a linear relationship, as defined by the slope equation: Y = aX + b
The scatterplot above shows a positive, linear association:
6. The strength of the scatterplot is related to how tight the points are – in this analysis, we
see a ‘moderate to strong’ correlation. There are nearly ten cluster points falling exactly on
the slope line.
The association is determined by the R^2 of the Fit Line. The ‘Fit Line’ is also known as
the regression line. The slope of the regression line = 0.151.
The direction of the scatterplot provides a negative gradient. This is most likely due to
multiple outliers above the slope line entirely, possibly showing higher costs regardless of
quality of manual patching work.
This scatterplot also supports the hypothesis as it shows that as quality goes up, the cost of manual
patching decreases. This is reconfirmed by the ‘moderate to strong’ strength in correlation of the
clustered points, and a positive slope.
3. Correlation Matrix [Refer to Appendix for SPSS Matrix]:
A Correlation matrix [Pearson Product-Moment] was generated to look at correlation amongst the
dependent variable and individual independent variables [not all are featured by scatterplots
above], statistical outliers and levels of significance of those relationships. Please refer to the full
report Correlation Matrix labeled ‘Table E’ in the Appendix. Please note: there were now row-
wise deletions, as there were few missing values.
The following independent variable correlations with Cost of Manual Patching were
flagged as significant at the 0.01 level (2-tailed): 1) crew specialization; 2) quality of
manual patching; 3) county lane miles; 4) production units of manual patching; and, 5)
number of freeze/thaw cycles.
The following independent variable correlations with Crew Specialization were flagged as
significant at the 0.01 level (2-tailed): 1) cost of manual patching; 2) quality of manual
patching; and, 3) number of freeze/thaw cycles.
The following independent variable correlations with Quality of Manual Patching were
flagged as significant at the 0.01 level (2-tailed): 1) cost of manual patching; 2) crew
specialization; and 3) production units of manual patching.
o There was a significant relationship with county lane miles at the 0.05 level (2-
tailed).
This shows there is a repetitive significance between cost, crew specialization, and quality of
manual patching – variables which were requested by the Pennsylvania Department of Highways.
These relationships provide additional results that support the hypothesis. By being significant at
the 0.01 level (two-tailed), it means 99 percent of the time there will be a relationship between cost
of patching, specialized crew, and quality of the patching work.
4. Initial RegressionModel(s):
Table C:
In Tables C and D below, I provided a final test for significance between the independent variables,
and the dependent variable [cost of the manual patching, by dollars] – a test of significance of the
regression slope. I used a multiple regression analysis, adding variables increasingly to predict a
7. change in the dependent variable, and re-purposed the data requested for Steps 4 – 7. Please refer
to the Output Data provided separately for a full visual Step-by-Step processes.
Cost of Manual Patching:1
Step 4 Step 5 Step 6 Step 7
Crewspec -3275.130 -.124***
Unitcost 1,782.374 .269***
Laborprod -864.467 -.012
Quality -8361.313 -.107* -6482.359 -.083*
Produnit 25.811 .646*** 24.820 .622*** 32.197 .807***
Snowdays 2373.561 .272*** -24.264 -.003 -250.361 -.029 97.207 .011
Freethaw -746.948 -.111 -387.730 -.058 -378.822 -.056 -188.152 -.028
Matcost 1498.102 .121 1,102.377 .089* 1,272.666 .103* 850.520 .069*
Lanemile2 302.206 .798*** 130.412 .344*** 123.187 .325*** 77.436 .204***
N: 67 67 67 67
R^2 .649 .861 .870 .942
In Table C, I would like to review the ‘R Square’ statistic – the coefficient of determination. I will
review the slope and strength of statistically significant variables in Table D. The R Square statistic
measures the proportion of total variation in the slope (Y).
This graph is strong as it shows the variation in the dependent variable when affected by
individual independent variables. By adding variables in a step-by-step fashion, it is easier
to determine which variables are significant, even without the slope or two-tailed test. In
this analysis, all R Square statistics are strong [above 0.40].
Step 4:
o The R Square is 0.649, meaning when looking at those specific variables [snow
days, freeze/thaw cycles, material cost, and county lane miles] just under 65% of
the costs of manual patching are accounted for.
Step 5:
o The R Square is 0.861, meaning when the specific independent variable of
‘production units of manual patching’ is added into the regression model just over
86% of the costs of manual patching become accounted for.
o This is important, as it shows that one variable – Production Units – can change the
dependent variable costs in manual patching by over 22%.
Step 6:
1 Cost of Manual Patching:Dependent Variable
2 Independent Variables Full Namaes:County Lane Miles;Material CostIndex Manual Patching;Number of Freeze
Thaw Cycles; Number of Days [with] Snow on Ground; Production Units of Manual Patching;Quality of Manual
Patching;Labor Productivity Index; Unit Cost of Manual Patching;Percent Crew Specialization [First50%]
8. o The R Square is 0.870, when the specific independent variable of ‘quality of manual
patching’ is added into the regression model just over 86% of the costs of manual
patching become accounted for.
o This is in line with Graph B, which supports the hypothesis that as quality goes up,
costs go down. Seeing as there was only just over a 1% shift in the dependent
variable, this regression supports previous results.
Step 7:
o The R Square is 0.942, meaning when looking at those specific variables [labor
productivity, unit cost and crew specialization] just over 94% of the costs of manual
patching become accounted for.
o This is important for similar reasons to Step 6 – it is in line with Graph A. This R
Square statistic supports the hypothesis that crew specialization lowers costs in
manual patching. There wasn’t a significant shift in the dependent variable, taking
into account two other independent variables were added simultaneously.
5. Final Model [without insignificant variables]:
Table D:
In Table D, I have removed those not significant at the 0.05 level [or higher i.e. 0.01 and 0.001]. I
will provide the slope and statistical significance of the following variables, which directly support
the hypothesis: 1) Crew Specialization and 2) Quality of Manual Patching. I will also look at the
slope and significance of Production Units of Manual Patching, based on the 20% variation
influence it had on the dependent variable. Lastly, I will look at County Lane Miles variable, which
was statistically significant at the 0.001 level in Steps 4 – 7. Furthermore, the slope is based on
that seen in Step 7, once all independent variables are accounted for.
Cost of Manual Patching:3
Step 4 Step 5 Step 6 Step 7
Crewspec -3275.130 -.124***
Unitcost 1782.374 .269***
Quality -8361.313 -.107* -6482.359 -.083*
Produnit 25.811 .646*** 24.820 .622*** 32.197 .807***
Snowdays 2373.561 .272*** -24.264 -.003 -250.361 -.029 97.207 .011
Matcost 1498.102 .121 1,102.377 .089* 1,272.666 .103* 850.520 .069*
Lanemile4 302.206 .798*** 130.412 .344*** 123.187 .325*** 77.436 .204***
N: 67 67 67 67
3 Cost of Manual Patching:Dependent Variable
4 Independent Variables Full Namaes:County Lane Miles;Material CostIndex Manual Patching;Number of Freeze
Thaw Cycles; Number of Days [with] Snow on Ground; Production Units of Manual Patching;Quality of Manual
Patching;Labor Productivity Index; Unit Cost of Manual Patching;Percent Crew Specialization [First50%]
9. Crew Specialization:
o Statistical Significance at the 0.001 level – a 99.9% relation with the dependent
variable.
o Using the formula provided in Step 2 of the results, we can determine the slope is
(-3222.830).
Quality of Manual Patching
o Statistical Significance at the 0.05 level – a 95% relation with the dependent
variable.
o Using the formula provided in Step 2 of the results, we can determine the slope is
(-25.811).
Production Units of Manual Patching
o Statistically Significance at the 0.001 level [Steps 5 – 7] – a 99.9% relation with the
dependent variable.
o Using the formula provided in Step 2 of the results, we can determine the slope is
30.534.
County Lane Miles
o Statistical Significance at the 0.001 level [Steps 4 – 7] – a 99.9% relation with the
dependent variable.
o Using the formula provided in Step 2 of the results, we can determine the slope is
90.913.
All four of these independent variables were statistically significant at the 0.05 level at least once
in the multiple regression model. These percentages show a strong correlation between the
independent variables above and cost of manual patching. When looking at the slope, let’s look at
an example. When looking at production units, for $1 increase in cost of manual patching, there is
a 30% increase in production units. This makes sense given the R Square statistic seen in Table C,
as well as the significance level for the quality variable. Something important to see is the negative
slope of the variables crew specialization and quality of manual patching – both which had
negative gradients in Graphs A and B.
Given that the Crew Specialization and Quality of Manual Patching were statistically
significant at the 0.05 levelor greater, we can conclude that as crew specialization and quality
of patching increases, costs of patching by dollars decrease. Therefore, we can reject the null
hypothesis.
Conclusions:
Despite a wide range in descriptive statistics [minimum and maximum], the variable of Crew
Specialization deviated little – just under 7% of a mean nearly at 80%. That shows a high number
of specialized crews in the 67 counties. The mean averages of Cost of Manual Patching, our other
variable highly analyzed for hypothesis support, showed a $272,421.67 average in cost of manual
patching – with just under a $200,000 standard deviation. This provided us with the expectation
for a high probability of outliers, which was confirmed in Graphs A and B of the results. When
10. holding for specific independent variables, the Graphs both had a negative gradient, which
supported the hypothesis that as quality and specialization goes up, costs continue to go down. The
graphs were moderate to strong correlations, with linear regression supported by the numbers
found in the Correlation Matrix. Cost of Manual Patching, Crew Specialization, and Quality of
Manual Patching were flagged as statistically significant at the 0.01 level or more – stating that 99
percent of the time there will be a relationship between cost of patching, specialized crews, and
quality of patching work. Both Tables C and D supported previous test results, and provided insight
for slope and R Square statistics. The most significant finding was Production Units affecting the
dependent variable by over 20%. Once again, R Square provided further support for our
hypothesis, showing between 1% and 7% changes [taking into account three variables] for Quality
and Crew variables. Based on these results, I rejected the null hypothesis.
Limitations: Considering the Research Design [Interrupted Time Series], the limitations include
both History and Instrumentation. Looking at the original brief, there is little qualitative
knowledge as to the history of the limited or widely available transportation in the city. A city
not unlike that of Atlanta has more potholes due to lack of centralized transportation – therefore
there are more cars on the road – more cars create more potholes. Furthermore, cities may have
different ways of patching a pothole. It is unclear if specialized crews all have the same
instruments, if they were slowly provided updated equipment, or if it varies per the 67 districts.
Lastly, the weather variables including freeze/thaw cycles and snow on the ground could provide
another analysis entirely of their own. Considering that both independent variables were
requested to be compared in Step 4, and not individually, it is unclear how strongly they affect
the dependent variable. There may also be other natural elements which data did not provide for.
11. Appendix:
Correlations
COST OF
MANUAL
PATCHIN
G
PRODUCTIO
N UNITS OF
MANUAL
PATCHING
NUMBE
R OF
FREEZ
E THAW
CYCLE
S
QUALITY
OF
MANUAL
PATCHIN
G
MATERIA
L COST
INDEX
MANUAL
PATCHIN
G
NUMBE
R OF
DAYS
SNOW
ON
GRND
COUNT
Y LANE
MILES
UNIT
COST OF
MANUAL
PATCHIN
G
PERCENT
CREW
SPECIALIZATI
ON (FIRST
50%)
LABOR
PRODUCTIVI
TY INDEX
COST OF
MANUAL
PATCHING
Pearson
Correlatio
n
1 .872**
-.372**
-.389**
.217 -.045 .745**
-.032 -.323**
.112
Sig. (2-
tailed)
.000 .002 .001 .080 .722 .000 .800 .009 .371
N 66 66 66 66 66 66 66 66 65 66
PRODUCTION
UNITS OF
MANUAL
PATCHING
Pearson
Correlatio
n
.872**
1 -.272*
-.356**
.140 .147 .574**
-.406**
-.109 .442**
Sig. (2-
tailed)
.000 .027 .003 .261 .240 .000 .001 .388 .000
N 66 66 66 66 66 66 66 66 65 66
NUMBER OF
FREEZE THAW
CYCLES
Pearson
Correlatio
n
-.372**
-.272*
1 .092 -.153 .173 -.360**
.016 .330**
-.038
Sig. (2-
tailed)
.002 .027 .461 .216 .160 .003 .899 .007 .759
12. N 66 66 67 67 67 67 67 66 66 66
QUALITY OF
MANUAL
PATCHING
Pearson
Correlatio
n
-.389**
-.356**
.092 1 .044 -.189 -.272*
.223 .315**
-.155
Sig. (2-
tailed)
.001 .003 .461 .726 .125 .026 .072 .010 .215
N 66 66 67 67 67 67 67 66 66 66
MATERIAL
COST INDEX
MANUAL
PATCHING
Pearson
Correlatio
n
.217 .140 -.153 .044 1 .042 .119 .105 .067 -.031
Sig. (2-
tailed)
.080 .261 .216 .726 .736 .337 .403 .596 .807
N 66 66 67 67 67 67 67 66 66 66
NUMBER OF
DAYS SNOW
ON GRND
Pearson
Correlatio
n
-.045 .147 .173 -.189 .042 1 -.364**
-.372**
.057 .329**
Sig. (2-
tailed)
.722 .240 .160 .125 .736 .002 .002 .651 .007
N 66 66 67 67 67 67 67 66 66 66
COUNTY LANE
MILES
Pearson
Correlatio
n
.745**
.574**
-.360**
-.272*
.119 -.364**
1 .081 -.232 .016
Sig. (2-
tailed)
.000 .000 .003 .026 .337 .002 .520 .061 .900
N 66 66 67 67 67 67 67 66 66 66
UNIT COST OF
MANUAL
PATCHING
Pearson
Correlatio
n
-.032 -.406**
.016 .223 .105 -.372**
.081 1 -.139 -.834**
13. Sig. (2-
tailed)
.800 .001 .899 .072 .403 .002 .520 .270 .000
N 66 66 66 66 66 66 66 66 65 66
PERCENT
CREW
SPECIALIZATI
ON (FIRST
50%)
Pearson
Correlatio
n
-.323**
-.109 .330**
.315**
.067 .057 -.232 -.139 1 .226
Sig. (2-
tailed)
.009 .388 .007 .010 .596 .651 .061 .270 .070
N 65 65 66 66 66 66 66 65 66 65
LABOR
PRODUCTIVIT
Y INDEX
Pearson
Correlatio
n
.112 .442**
-.038 -.155 -.031 .329**
.016 -.834**
.226 1
Sig. (2-
tailed)
.371 .000 .759 .215 .807 .007 .900 .000 .070
N 66 66 66 66 66 66 66 66 65 66
**. Correlation is significantatthe 0.01 level (2-tailed).
*. Correlation is significantatthe 0.05 level (2-tailed).