15Data Analysis The Sun Coast Remediation Data Set

1
5
Data Analysis: The Sun Coast Remediation Data Set
Insert Your Name Here
Insert University Here
Course Name Here
Instructor Name Here
Date
Data Analysis: Hypothesis Testing
In this project, we are going to use the data set: Sun Coast
Remediation in Microsoft Excel using the Data Analysis
Toolpack to explore the correlation of variables and conduct
regression analysis. The results of the analysis will be displayed
here directly from Microsoft Excel and the resulting predictive
regression equations will be discussed.
Correlation: Hypothesis Testing
Hypotheses:
i. Microns versus mean annual sick days per employee
Ho1: There is no significant linear relationship/correlation
between microns and mean annual sick days per employee.
Ha1:There is a significant linear relationship/correlation
between microns and mean annual sick days per employee.

microns
mean annual sick days per employee
microns
1
mean annual sick days per employee
-0.715984185
1
Regression Statistics
Multiple R
0.715984185
R Square
0.512633354
Adjusted R Square
0.507807941
Standard Error
1.327783455
Observations
103
ANOVA
df
SS
MS
F
Significance F

Regression
1
187.2953239
187.3
106.236
1.89059E-17
Residual
101
178.0638994
1.763
Total
102
365.3592233

Coefficients
Standard Error
t Stat
P-value
Lower 95%
Upper 95%
Lower 95.0%
Upper 95.0%
Intercept
10.08144483
0.315156969
31.989
1.17E-54
9.456258184
10.70663
9.456258
10.70663148
microns
-0.522376554
0.050681267
-10.307
1.89E-17
-0.62291455
-0.42184
-0.62291
-0.421838554
The Pearson correlation coefficient is r = -0.71598 when
rounded to 4 decimal places.
Interpretation: It indicates there is a strong negative correlation
between the two variables.
The value of the coefficient of determinatio n, r2 is 0.5126.

Interpretation: About 51.26% of the variation between microns
and mean annual sick days per employee is explained by the
relationship.
From the results the p-value is 1.89E-17, a very small value. By
using the alpha level of significance to be 0.05 then the p-value
is less than the alpha value i.e., 1.89E-17 < 0.05. As a result, we
reject the null hypothesis and accept the alternative hypothesis.
Therefore, we conclude that there is a statistically significant
linear relationship between mean annual sick days per employee
Simple Regression: Hypothesis Testing
Hypotheses:
Ho2:β1 = 0 (The regression is not significant)
Ha2:β1 ≠ 0 (The regression is significant)
SUMMARY OUTPUT
Multiple R
0.939559
R Square
0.882772
Adjusted R Square
0.882241
Standard Error
161.303
Observations
223
ANOVA

df
SS
MS
F
Significance F
Regression
1
43300521
43300521
1664.211
7.7E-105
Residual
221
5750122
26018.65
Total
222
49050644
Coefficients
Standard Error
t Stat
P-value
Lower 95%
Upper 95%
Lower 95.0%
Upper 95.0%
Intercept
1753.602

30.36296
57.75465
2.6E-135
1693.764
1813.44
1693.764
1813.44
lost time hours
-6.15739
0.150936
-40.7947
7.7E-105
-6.45485
-5.85994
-6.45485
-5.85994
Y = safety training expenditure and X = lost time hours
The regression model is given by: Y = 1753.602 - 6.15739*X2.
This means that for every additional hour in lost time hours, the
safety training expenditure decreases by 6.15739 money units.
The multiple R is 0.939559. Since this is a simple linear
regression analysis, the multiple R is same as correlation
coefficient, r. As such, it indicates there is a strong positive
correlation between safety training expenditure and lost time
hours. R square is the coefficient of determination. Its value is
0.882772. Its interpretation is that the regression model
explains 88.2772% of the data variation. The regression is a
good fit.
The alpha level is 0.05. It is the level of significance. From the
ANOVA results, the ANOVA F-value is 1664.210687. This is
the ratio of mean sum of squares total (MST) to the mean sum
of squares due to error (MSE). The significance F is 7.6586E-
105, which is a very small value. Since 7.6586E-105 < 0.05, we
reject the null hypothesis and accept the alternative hypothesis.
We conclude that the regression fit is significant. The statistical
significance of the X variable is also 7.6586E-105 which is less

than 0.05. It means the lost time hours is a significant predictor
of safety training expenditure.
Multiple Regression: Hypothesis Testing
Hypotheses:
Ho3:β1 = β2 = β3 = β4 = β5 = 0 (The regression fit is not
significant)
Ha3: At least one is different from zero (The regression fit is
significant)
SUMMARY OUTPUT
Multiple R
0.583706496
R Square
0.340713274
Adjusted R Square
0.338511248
Standard Error
2564.049485
Observations
1503
ANOVA
df
SS
MS
F
Significance F

Regression
5
5.09E+09
1.02E+09
154.7271
1.2E-132
Residual
1497
9.84E+09
6574350
Total
1502
1.49E+10
Coefficients
Standard Error
t Stat
P-value
Lower 95%
Upper 95%
Lower 95.0%
Upper 95.0%
Intercept
32243.94
1307.24
24.67
5.27E-113
29679.72
34808.16
29679.72
34808.16

Angle in Degrees
-86.46
17.20
-5.03
5.581E-07
-120.20
-52.72
-120.20
-52.72
Chord Length
-741.56
1361.86
-0.54
0.5861673
-3412.92
1929.80
-3412.92
1929.80
Velocity (Meters per Second)
42.06
4.30
9.78
6.023E-22
33.63
50.50
33.63
50.50
Displacement
-65093.43
8026.09
-8.11
1.042E-15
-80837.01
-49349.86
-80837.01
-49349.86

Decibel
-241.11
10.27
-23.49
4.07E-104
-261.25
-220.97
-261.25
-220.97
Y = Frequency (Hz).
Ley X1 = Angle in Degrees, X1 = Angle in Degrees, X2 =
Chord Length, X3 = Velocity, X4 = Displacement, X4 =
Displacement, X5 = Decibel
The regression model is given by:
Y = 32243.94 - 86.46*X1 - 741.56*X2 + 42.06*X3 -
65093.43*X4 - 241.11*X6
For this model, Y is the response variable and the Xi’s, i =
1,2,3,4,5 are the predictor variables. If we pick the predictors
one by one while each time holding all the others constant, then
for every predictor variable with a negative coefficient we pick,
the response variable (frequency) will decrease with the
corresponding predictor coefficient units. The response variable
will increase by the corresponding predictor variable coefficient
units for every unit increase in the predictor variable if the
coefficient of the predictor variable is positive while we hold
all other predictor variables constant.
Multiple R value is 0.583706496. This indicates a moderately
strong linear relationship between frequency and the predictor
variables. The R^2 value is 0.340713274. This means that the
model explains only 34.07% of the variability of the response
data around its mean.
While the alpha level of significance is 0.05, the ANOVA
significant F-value is 1.2E-132 which is very small compared to
alpha = 0.05. We reject the null hypothesis and accept the
alternative hypothesis. We conclude that the regression fit is

significant. With the exception of the Chord Length predictor,
the statistical significance of all the other predictor variables
are very small when compared to the alpha = 0.05 level of
significance. Chord length is therefore not a significant
predictor of frequency. Angle, Velocity, Displacement and
Decibel are all significant predictors of frequency.
References
Glen, S. (n.d). "Excel Regression Analysis Output Explained."
StatisticsHowTo.com:
https://www.statisticshowto.com/probability-and-
statistics/excel-statistics/excel-regression-analysis-output-
explained/
1
5
Insert Title Here
Insert Your Name Here
Insert University Here
Course Name Here
Instructor Name Here
Date
Data Analysis: Hypothesis Testing
Use the Sun Coast Remediation data set to conduct a correlation
analysis, simple regression analysis, and multiple regression
analysis using the correlation tab, simple regression tab, and

multiple regression tab respectively. The statistical output
tables should be cut and pasted from Excel directly into the
final project document. For the regression hypotheses, display
and discuss the predictive regression equations.
Correlation: Hypothesis Testing
Restate the hypotheses:
Example:
Ho1: There is no statistically significant relationship between
height and weight.
Ha1:There is a statistically significant relationship between
height and weight.
Enter data output results from Excel Toolpak here.
Interpret and explain the correlation analysis results below the
Excel output. Your explanation should include: r, r2, alpha
level, p value, and rejection or acceptance of the null hypothesis
and alternative hypothesis.
Example:
The Pearson correlation coefficient of r = .600 indicates a
moderately strong positive correlation. This equates to an r2 of
.36, explaining 36% of the variance between the variables.
Using an alpha of .05, the results indicate a p value of .023 <
.05. Therefore, the null hypothesis is rejected, and the
alternative hypothesis is accepted that there is a statistically
significant relationship between height and weight.
Note: Excel data analysis Toolpak does not automatically
calculate the p value when using the correlation function. As a
workaround, the data should also be run using the regression
function. The Multiple R is identical to the Pearson r in simple
regression, R Square is shown, and the p value is generated. Be

sure to show your results using both the correlation function
and simple regression function.
Simple Regression: Hypothesis Testing
Ho2:
Ha2:
Interpret and explain the simple regression analysis results
below the Excel output. Your explanation should include:
multiple R, R square, alpha level, ANOVA F value, accept or
reject the null and alternative hypotheses for the model,
statistical significance of the x variable coefficient, and the
regression model as an equation with explanation.
Multiple Regression: Hypothesis Testing
Ho3:
Ha3:
Interpret and explain the simple regression analysis results
below the Excel output. Your explanation should include:
multiple R, R square, alpha level, ANOVA F value, accept or
reject the null and alternative hypotheses for the model,
statistical significance of the x variable coefficients, and the
regression model as an equation with explanation.
References
Include references here using hanging indentations. Remember
to remove this example.
Creswell, J. W., & Creswell, J. D. (2018). Research design:
Qualitative, quantitative, and mixed methods approaches (5th
ed.). Sage.

15Data Analysis The Sun Coast Remediation Data Set

15Data Analysis The Sun Coast Remediation Data Set

Recommended

Recommended

More Related Content

More from KiyokoSlagleis

More from KiyokoSlagleis (20)

15Data Analysis The Sun Coast Remediation Data Set