1
Linear Regression and Time Studies 2
Linear Regression and Time Studies
Linear Regression and Time Studies
In this assignment, we are going to perform linear regression on the COVID tracking data for the state of my residence, i.e., Maryland, for the following variables:
1. “%tested”
2. “%positive” and
3. “deaths”.
In linear regression for the data, we will first be choosing an independent variable x and a dependent variable y and try to find a best possible linear relationship between the two. We will be trying to find coefficients a and b such that y = bx+c. Here b is the slope of the line and c is the y intercept.
1. LINEAR REGRESSION FOR THE VARIABLE “% TESTED”
For the variable % tested we choose the independent variable, x to be the “date” and the dependent variable y to be “% tested”. We formulate the null and alternate hypotheses as follows:
Null Hypothesis: “The slopewhich means that y is independent of x.”
Alternate Hypothesis: “The slope ≠ 0 which means that y is dependent on x.”
Output:
First, we note that the equation of the straight line we obtained is y = 0.0038x - 165.24. Hence the value of slope b = 0.0038 and c =- 165.24.
In the following two tables, we summarise the output data we got by performing the linear regression:
Regression Statistics
Multiple R
0.9624
R Square
0.926214
Adjusted R Square
0.926014
Standard Error
0.114243
Observations
372
Coefficients
Standard Error
t Stat
P-value
Lower 95%
Upper 95%
Lower 95.0%
Upper 95.0%
Intercept
-165.242
2.431183
-67.9679
4.1E-211
-170.023
-160.462
-170.023
-160.462
% tested
0.003759
5.52E-05
68.15053
1.6E-211
0.003651
0.003868
0.003651
0.003868
We observe the following from the output data:
1. r-squared value: The r-squared value, 0.926214 is very close to 1. This means that there is a strong positive linear relationship between the two variables x and y.
2. p-value: We find that the p-value is 1.6E-211which is way less than 0.05. Hence we reject the null hypothesis and we conclude that the alternate hypothesis is significant in the 0.05 significance level.
Graph of the data and the linear fit:
Using the p-value, we concluded that this model is significant. Now, we use this model predicted outcomes for the next 7 days after the end of the workshop and put it in the following table:
DATE
% tested
08-03-2021
1.144655
09-03-2021
1.148415
10-03-2021
1.152174
11-03-2021
1.155933
12-03-2021
1.159692
13-03-2021
1.163451
14-03-2021
1.16721
2. LINEAR REGRESSION FOR THE VARIABLE “% POSITIVE”
For the variable % positive, we choose the independent variable, x to be the “date” and the dependent variable y to be “% positive”. We formulate the null and alternate hypotheses as follows:
Null Hypothesis: “The slopewhich means that y is independent of x.”
Alternate Hypothesis: “The slope ≠ 0 which means that y is dependent on x.”
Output:
First, we note that the equation of the straight line we obtained is y = -0.0004x + 19.331. Hence the value of slope b = -0.0004 and c = 19.33 ...
Envelope of Discrepancy in Orthodontics: Enhancing Precision in Treatment
1Linear Regression and Time Studies 2Linear Regres
1. 1
Linear Regression and Time Studies 2
Linear Regression and Time Studies
Linear Regression and Time Studies
In this assignment, we are going to perform linear regression on
the COVID tracking data for the state of my residence, i.e.,
Maryland, for the following variables:
1. “%tested”
2. “%positive” and
3. “deaths”.
In linear regression for the data, we will first be choosing an
independent variable x and a dependent variable y and try to
find a best possible linear relationship between the two. We will
be trying to find coefficients a and b such that y = bx+c. Here b
is the slope of the line and c is the y intercept.
1. LINEAR REGRESSION FOR THE VARIABLE “% TESTED”
For the variable % tested we choose the independent variable, x
to be the “date” and the dependent variable y to be “% tested”.
We formulate the null and alternate hypotheses as follows:
Null Hypothesis: “The slopewhich means that y is independent
of x.”
Alternate Hypothesis: “The slope ≠ 0 which means that y is
dependent on x.”
Output:
First, we note that the equation of the straight line we obtained
2. is y = 0.0038x - 165.24. Hence the value of slope b = 0.0038
and c =- 165.24.
In the following two tables, we summarise the output data we
got by performing the linear regression:
Regression Statistics
Multiple R
0.9624
R Square
0.926214
Adjusted R Square
0.926014
Standard Error
0.114243
Observations
372
Coefficients
Standard Error
t Stat
P-value
Lower 95%
Upper 95%
Lower 95.0%
Upper 95.0%
Intercept
-165.242
2.431183
-67.9679
4.1E-211
-170.023
-160.462
-170.023
-160.462
% tested
0.003759
5.52E-05
3. 68.15053
1.6E-211
0.003651
0.003868
0.003651
0.003868
We observe the following from the output data:
1. r-squared value: The r-squared value, 0.926214 is very close
to 1. This means that there is a strong positive linear
relationship between the two variables x and y.
2. p-value: We find that the p-value is 1.6E-211which is way
less than 0.05. Hence we reject the null hypothesis and we
conclude that the alternate hypothesis is significant in the 0.05
significance level.
Graph of the data and the linear fit:
Using the p-value, we concluded that this model is significant.
Now, we use this model predicted outcomes for the next 7 days
after the end of the workshop and put it in the following table:
DATE
% tested
08-03-2021
1.144655
09-03-2021
1.148415
10-03-2021
1.152174
11-03-2021
1.155933
12-03-2021
1.159692
13-03-2021
4. 1.163451
14-03-2021
1.16721
2. LINEAR REGRESSION FOR THE VARIABLE “%
POSITIVE”
For the variable % positive, we choose the independent
variable, x to be the “date” and the dependent variable y to be
“% positive”. We formulate the null and alternate hypotheses as
follows:
Null Hypothesis: “The slopewhich means that y is independent
of x.”
Alternate Hypothesis: “The slope ≠ 0 which means that y is
dependent on x.”
Output:
First, we note that the equation of the straight line we obtained
is y = -0.0004x + 19.331. Hence the value of slope b = -0.0004
and c = 19.331.
In the following two tables, we summarise the output data we
got by performing the linear regression:
Regression Statistics
Multiple R
0.802194
R Square
0.643515
Adjusted R Square
0.642488
Standard Error
0.032809
Observations
349
Coefficients
Standard Error
5. t Stat
P-value
Lower 95%
Upper 95%
Lower 95.0%
Upper 95.0%
Intercept
19.3307
0.768533
25.15271
3.32E-80
17.81913
20.84227
17.81913
20.84227
% positive
-0.00044
1.74E-05
-25.0278
1.01E-79
-0.00047
-0.0004
-0.00047
-0.0004
We observe the following from the output data:
1. r-squared value: The r-squared value, 0.643515 is in the
middle of 0 and 1. This means the linear relationship between
the two variables x and y is not very weak and at the same time
not very strong as well.
2. p-value: We find that the p-value is 1.01E-79 is way less than
0.05. Hence we reject the null hypothesis and we conclude that
the alternate hypothesis is significant in the 0.05 significance
level.
6. Graph of the data and the linear fit:
Using the p-value, we concluded that this model is significant.
Now, we use this model predicted outcomes for the next 7 days
after the end of the workshop and put it in the following table:
DATE
% positive
08-03-2021
0.019669
09-03-2021
0.019233
10-03-2021
0.018797
11-03-2021
0.018361
12-03-2021
0.017924
13-03-2021
0.017488
14-03-2021
0.017052
3. LINEAR REGRESSION FOR THE VARIABLE “DEATHS”
For the variable deaths, we choose the independent variable, x
to be the “date” and the dependent variable y to be “deaths”. We
formulate the null and alternate hypotheses as follows:
Null Hypothesis: “The slopewhich means that y is independent
of x.”
Alternate Hypothesis: “The slope ≠ 0 which means that y is
dependent on x.”
Output:
First, we note that the equation of the straight line we obtained
is y = 19.335x - 848470. Hence the value of slope b = 19.335
and c = -848470.
7. In the following two tables, we summarise the output data we
got by performing the linear regression:
Regression Statistics
Multiple R
0.965519
R Square
0.932227
Adjusted R Square
0.932036
Standard Error
538.7689
Observations
357
Coefficients
Standard Error
t Stat
P-value
Lower 95%
Upper 95%
Lower 95.0%
Upper 95.0%
Intercept
-848470
12197.61
-69.5603
6.5E-209
-872458
-824481
-872458
-824481
deaths
19.33474
0.276689
8. 69.87885
1.4E-209
18.79058
19.87889
18.79058
19.87889
We observe the following from the output data:
1. r-squared value: The r-squared value, 0.932227 is very close
to 1. This means that there is a strong positive linear
relationship between the two variables x and y.
2. p-value: We find that the p-value is 1.4E-209 and it is way
less than 0.05. Hence we reject the null hypothesis and we
conclude that the alternate hypothesis is significant in the 0.05
significance level.
Graph of the data and the linear fit:
Using the p-value, we concluded that this model is significant.
Now, we use this model predicted outcomes for the next 7 days
after the end of the workshop and put it in the following table:
DATE
deaths
08-03-2021
7343.826
09-03-2021
7363.16
10-03-2021
7382.495
11-03-2021
7401.83
12-03-2021
7421.165
13-03-2021