Literally the word regression means ‘return to the origin’. In statistics, the word is used in a
different sense. If two variables are correlated, the unknown value of one of the variables can be
estimated by using the known value of the other variable. The so estimated value may not be
equal to the actually observed value, but it will be close to the actual value. Regression Analysis,
in general sense, means the estimation or prediction of the unknown value of one variable from
the known value of the other variable.
The Regression Analysis confined to the study of only two variables at a time is termed as
Simple Regression. But quite often the values of a particular phenomenon may be affected by
multiplicity of causes. The Regression analysis for studying more than two variables at a time is
known as Multiple Regression.
In Regression Analysis there are two types of variables. The variable whose value is influenced
or is to be predicted is called dependent variable. The variable which influences the values or
used for prediction is called independent variable. The Regression Analysis independent variable
is known as regressor or predictor or explanator while the dependent variable is also known as
regressed or explained variable.
LINEAR & NON-LINEAR REGRESSION
If the given bivariate data are plotted on a graph, the points so obtained on the diagram will more
or less concentrate around a curve, called the “Curve of Regression”. The mathematical equation
of the Regression curve, is called the Regression Equation. If the regression curve is a straight
line, we say that there is linear regression between the variables under study. If the curve of
regression is not a straight line, the regression is termed as curved or non-linear regression.
The property of the tendency of the actual value to lie close to the estimated value is called
regression. In a wider usage regression is the theory of estimation of unknown value of a variable
with the help of known values of the variables. The regression theory was first introduced and
developed by Sir Francis Galton in the field of Genetics.
Here, firstly, a mathematical relation between the two variables is framed. This relation which is
called regression equation is obtained by the method of least squares. It may be linear or non –
For a bivariate data on x and y, the regression equation obtained with the assumption that x is
dependent on y is called regression of x on y. The regression of x on y is:
(x – AM of x ) = bxy (y – AM of y)
The regression equation obtained with the assumption that y is dependent on x is called
regression of y on x. the regression of y on x is –
(y – AM of y) = byx (x – AM of x)
The following set of formulas explains all the terms given below:
r. бx Cov (x,y) r. бy Cov (x,y)
bxy = bxy = byx = byx =
бy бy2 бx бx2
nΣxy - Σx.Σy Σdx.dy nΣxy - Σx.Σy Σdx.dy
bxy= bxy = byx= byx =
nΣy2 -(Σy)2 nΣx2 -(Σx)2
The regression of x on y is used for the estimation of x values and the regression of y on x is
used for the estimation of y values. The graph of the regression equations are the regression
PROPERTIES OF REGRESSION
Regression coefficient are the coefficients of the independent variables in the regression
1. The regression coefficient bxy is the change occurring in x for unit change in y. The
regression coefficient byx is the change occurring in y for unit change in x.
2. The regression coefficient is independent of the origin of measurements of the variables.
But, they are dependent on the scale.
3. The geometric mean of regression coefficients is equal to the coefficient of correlation
4. The regression coefficients cannot be of opposite signs. If r is positive, both the
regression coefficients will be positive. If r is negative, both the regression coefficients
will be negative. If r is zero, both the regression coefficients will be zero.
5. Since coefficient of correlation, numerically cannot be greater than 1, the product of
regression coefficients cannot be greater than 1.
PROPERTIES OF REGRESSION LINES
There are two regression lines.
1. The regression lines intersect at ( x,y)
2. The regression lines have positive slope if the variables are positively correlated.
They have negative slope if the variables are negatively correlated.
3. If there is perfect correlation, the regression lines coincide ( there will be only one
LINES OF REGRESSION
Line of regression is the lines which gives the best estimate of one variable for any given value
of the other variable. In case of two variable say x & y, we shall have two regression equations; x
on y and the other is y on x.
Line of regression of y on x is the line which gives the best estimate for the value of y for any
specified value of x.
Line of regression of x on y is the line which gives the best estimate for the value of x for any
specified value of y.
LINES OF REGRESSION OF y on x
(y - AM of y) = (x – AM of x) r. бy
LINES OF REGRESSION OF x on y
(x – AM of x) = (y - AM of y) r. бx
a. When r=0 i.e., when x & y are uncorrelated, then the lines of regression of y on x, and x on y
are given as: y – y = 0 and x – x = 0. The lines are perpendicular to each other.
b. When r=+1 then the two lines coincide.
c. If the value of r is significant, we can use the lines of regression for estimation and
d. If r is not significant, then the linear model is not a good fit and hence the line of regression
should not be used for prediction.
COEFFICIENTS OF REGRESSION
a. bxy is the Coefficient of regression of x on y.
b. byx is the Coefficient of regression of y on x.
THEOREMS ON REGRESSION COEFFICIENTS
a. The correlation coefficient is the Geometric Mean between the Regression Coefficients i.e.,
r2= bxy byx
b. The sign to be taken before the square root is same as that of regression coefficients.
c. If one of the regression coefficient is greater than one, then the other must be less than one.
d. The AM of the modulus value of regression coefficients is greater than the GM of the
modulus value of the Correlation Coefficient.
e. Regression coefficients are independent of change of origin but not of scale.
X Y dx=X-X dy=Y-Y dx2 dy2 dxdy
91 71 1 1 1 1 1
97 75 7 5 49 25 35 bxy = byx =
105 69 18 -1 324 1 -18
121 97 31 27 961 729 837
67 70 -23 0 529 0 0 bxy = byx =
124 91 34 21 1156 441 714
51 39 -39 -31 1521 961 1209
73 61 -17 -9 289 81 153
111 80 21 10 441 100 210
57 47 -33 -23 1089 529 759
900 700 0 0 6360 2868 3900
(x-x) = bxy (y-y) (y-y) = byx (x-x)
(x-90) = 1.361(y-70)
(y-70) = 0.6132 (x-90)
x=1.361y - 5.27
y=0.6132x + 14.812
The data about the sales & advertisement expenditure of a firm is given below:
Sales Advertisement Expenditure
Means 40 6
Standard Deviations 10 1.5
Coefficient of Correlation is 0.9
o Estimate the likely sales for a proposed advertisement expenditure of Rs. 10 crores.
o What should be the advertisement expenditure if the firm proposes a sales target of 60
crores of rupees?
(x-x) = bxy (y-y) (y-y) = byx (x-x)
r. бx r. бy
bxy = бx
(x-40) = (0.9*10/1.5) (y-6) (y-6) = (0.9*1.50/10) (x-40)
x = 6y+4 y = 0.135x+0.6
x = 6*10+4 y = 0.135*60+0.6
x = 64 y =8.7
Point out the consistency, if any, in the following statement: “The Regression Equation of y on x
is 2y+3x=4 and the correlation coefficient between x & y is 0.8”
By using the following data, find out the two lines of regression and from them compute the
Karl-Pearson’s coefficient of correlation: ΣX=250; ΣY=300; ΣXY=7900; ΣX2=6500;
nΣxy - Σx.Σy nΣxy - Σx.Σy
bxy = byx =
10*7900 – 250*300 10*7900 – 250*300
bxy = byx =
10*10000 -(300)2 10*6500 -(250)2
= bxy* bxy rxy2
= 1.6* 0.4 rxy = 0.8
Find the two regression coefficients and hence the r. n=5; X=10; Y=20; Σ(X-4)2=100;
U=X-4; U=X-4=6; ΣU= nU = 30. Similarly ΣV=50