SIMPLE LINEAR REGRESSION(PA297Statistics for Public Administrator) Reporters: Atty. Gener R. Gayam, CPA Agapito “pete” M. Cagampang, PM Raymond B. Cabling, MD Presented to: Dr. Maria Theresa P. Pelones
SIMPLE LINEAR REGRESSION The Scatter Diagram In solving problems that concern estimation and forecasting, a scatter diagram can be used as a graphical approach. This technique consists of joining the points corresponding to the paired scores of dependent and independent variables which are commonly represented by X and Y on the X – y coordinate system. Below is an illustration of a scatter diagram using the data in Table 6.1. This table shows the data about the six years working experience and the income of eight employees in a big industrial corporation.
Figure 6.1 – A Scatter Diagram for Table 6.1 Data X X X X X X X X
For you to roughly predict the value of a dependent variable, such as years of working experience, from the dependent variable, which is income, your next step is to draw a trend line. This is a line passing through the series of points such that the total vertical measurement of the points below this line is more or less equal to the total measurements of the points above the line. If these requirements are satisfied, you draw a correct trend Y. The illustration is shown in figure 6.1
Figure 6.2 - A trend line drawn on the linear direction between working experience and income of eight employees Trend Line
Using the trend line draw in Figure 6.1 above, the value estimated for Y when X is 16, is 18. You should not fail to remember that if a “straight line” appears to describe the relationship, the algebraic approach called the regression formula can be used as explained in the next topic.
a = Ῡ - bX B. The Least Square Linear Regression Equation The least square linear regression equation can be understood through this formula known from algebra. Y = a + bx For instance the Y = a+bx in figure 6.1 in that line that gives the smallest sum of the squares of the vertical measurements or distance of the points from the line. In solving the regression equations, you need to solve first,
ΣX = 62 ΣY = 90 X = 7.75 Y = 11.25 Example: Solve the least squares regression line for the data scores in Table 6.1.
After solving the values of b and a, your regression equation obtained from Table 6.1 is.
Now, we are interested in the distance of the Y values from Y₁ the corresponding ordinate of the regression line. Here, we are going to base our measure of dispersion or variation around the regression line on the distance (Y₁ ‒ Y)². This can be well understood by this standard error of estimate formula given below. Se = Σ(Yi ‒ Ŷ)² n ‒ 2 √ C. The standard Error of Estimate
However, this formula entails a very tedious process of computing the standard error of estimate, so that the formula by Basil P. Korin (1977), which is easier to solve suggested as follows: Se = ΣYi² ‒ a(Yi) ‒ b(Xi ‒ Yi) n ‒ 2 Note: The symbol a and b stand for the intercept and the slope of the regression line. √
√ Example: Solve the standard error of estimate for the regression line which was derived from the data in Table 6.1. Se = Σ(Yi ‒ Ŷ)² n ‒ 2
Step 1 – Compute the value of Y at each of the X values. Example: Y = 6.68 + .59 (2) = 6.68 + 1.18 = 7.68 Do the rest by following the same procedure. Step 2 – Get the difference between (Yi ‒ Ŷ). Example: 8 – 7.86 = .14 Step 3 – Square all the difference Yi ‒ Ŷ. Example: (.14)² = .0196
Solution 2: Se = ΣYi² ‒ a(Yi) ‒ b(Xi ‒ Yi) n ‒ 2
√ √ √ √ √ Step 1 – Square Y₁ Example: (8²) = 64 Step 2 – Multiply XiYi Example: 2 X 8 = 16 Step 3 – Get the sum of Yi² and XiYi Step 4 – Apply the formula = 1084 – 6.68 (90) – .59 (791) n – 2 = 1084 – 601.2 – 466.69 8 – 2 = 1084 – 1067.89 8 – 2 = 16.11 6 = 2.685 = 1.64
The standard error of estimate is interpreted as the standard deviation. For example, if we measure vertically three standard errors from the regression line above and below, we will find that the same value of X will always fall between the upper and lower 3Se Limits. In the example above of the standard error of estimate which is 1.64 you will come up with 4.92 units (3) (1.64) above and below the regression line. This means that these “bounds” of 4.92 unit above and below the regression line pertain to all observations taken for that particular sample. If you draw two parallel lines, each of them lying one Se from the regression line, you will expect two thirds of the observations falling between these bounds. See Figure 6.1 for the illustration of the data in Table 6.1.
Y = 6.68 + .59 X Figure 6.3 – A regression Line with One Standard Error Distance