Transcript of "How To Compute The Best Fit Straight Line To A Set Of Data?"
1.
How-To Compute the Best-Fit Straight Line to a Set of Data Page 1
How-To Compute the Best Fit Straight Line to a Set of Data
Objective: Learn how to compute and use regression equations to a set of data.
Keywords and Concepts
1. Best fit straight line
2. Bivariate scatter plot
3. Regression equation
4. Y’ = mX + C
5. Independent variable
6. Dependent variable
7. Slope (rate of change) of the best-fit
line
8. Y-intercept
9. Correlation coefficient (r)
A bivariate scatter plot (two-variable relationship) reveals the association
between an individual’s score on one variable can be used to predict the corresponding
score on a related variable (see Fig. 1). This prediction depends on the equation for a
straight line that minimizes the variation of data points about it. The equation used to
draw the best-fit straight line is called a regression equation and was first used by Sir
Francis Galton (1822-1911) to show that when tall or short couples have children their
heights tend to “regress”, or revert to the mean height of their parents.
Figure 1. scatter plot of Fahrenheit versus Celsius temperature
FAHRENHEIT VERSUS CELSIUS SCATTER PLOT
32
42
52
62
72
82
92
102
112
0 10 20 30 40
Celsius, degree
Farhrenheit, degree
Value used to
construct line
Value used to
construct line
Regression Line
y = 1.80x + 32
2.
How-To Compute the Best-Fit Straight Line to a Set of Data Page 2
Regression Equation
Given a collection of paired sample data, the following formula (regression
equation) describes the relationship between X (independent or predictor variable) and
Y’ (dependent or response variable):
Y’ = mX + C (eq. 1)
where, m expresses the slope (rate of change) of the best-fit line, X is any particular X
value within the range of the data set, and C represents the Y-intercept or the value of X
when Y equals zero.
Use equation 2 to compute m as follows:
m = ρ
Σ∆Ψ
Σ∆Ξ
(eq. 2)
The symbol r equals the correlation coefficient between X and Y; SDY and SDX are
their respective X and Y variable standard deviations.
The Y-intercept C computes in equation 3 as:
C = Ψ− µ Ξ (eq. 3)
where, Y equals the mean of the Y scores, m equals the slope, and X equals the mean
of the X scores.
Equation 3 can be rewritten as follows:
C = Ψ− ρ(
Σ∆Ψ
Σ∆Ξ
)Ξ (eq. 4)
Combining equations 2 and 3 expresses the equation for the best-fit regression
line to predict Y (Y’) from any X value in equation 5:
′Y = r(
SDY
SDX
)X + Y − mX (eq. 5)
Alternatively, combining equations 2 and 4 expresses the equation for a straight
line as:
3.
How-To Compute the Best-Fit Straight Line to a Set of Data Page 3
′Y = r(
SDY
SDX
)X + Y − (r
SDY
SDX
)X (eq. 6)
The slope and Y-intercept of the best-fit regression line also can be computed
from raw scores with the following formula (equation 7):
Slope,m =
(
ΞΨ∑
Ν
) − (
Ξ∑
Ν
)(
Ψ∑
Ν
)
(
Ξ2
∑
Ν
) −(
Ξ∑
Ν
)2
(eq. 7)
where, the equation’s numerator equals the numerator for the correlation coefficient (r)
and the denominator equals SDX.
The raw score equation (eq. 8) computes the Y-intercept (C) as:
(eq. 8)
The denominator in equation 8 equals the square of SDX.
Example
Given the following 5 data points for temperature in degrees Fahrenheit (Y-
variable) and temperature in degrees Celsius (X-variable), compute the equation for the
best-fitting straight line.
Y
Fahrenheit Temperature
32 40 60 80 100
X
Celsius Temperature
0 4.44 15.55 26.6 37.77
Step 1. Compute r, SDx, and SDy.
∑Y=312; ∑Y2
=22624; ∑X=84.36; ∑X2
=2395.65; ∑XY=7015.6; N=5
r = 0.999
Y = 62.4; SDY = 25.12
X = 16.87; SDX = 13.95
4.
How-To Compute the Best-Fit Straight Line to a Set of Data Page 4
Step 2. Compute the slope (m) and Y-intercept (C) using equations 2 and 4,
respectively.
m = ρ
Σ∆Ψ
Σ∆Ξ
(eq. 2)
m = 0.9999 (25.12 ÷ 13.95)
m = 1.80
C = Ψ− ρ(
Σ∆Ψ
Σ∆Ξ
)Ξ (eq. 4)
C = 62.4(0.999
25.2
13.95
)16.87
Χ = 32.0
Step 3. The equation for the regression of degrees Celsius on degrees Fahrenheit
becomes:
Y’ = mX + C
Y’ = 1.8 X + 32.0
Step 4. Determine the best-fit straight line of the regression, and plot the
individual data points as a scatter diagram. (See figure 1) Arbitrarily
select a value of X near the maximum observed values of X, and
substitute the score in the equation to solve for Y’ (predicted Y). Plot this
point (X, Y’) on the scattergram. Repeat the procedure for another value
of X near the minimum observed value of X. The straight line joining the
two points represents the best-fitting straight line generated from the
regression equation.
32 0
40 4.44
60 15.55
80 26.6
Be the first to comment