2. CORRELATION
Correlation is the study of relationship between two or
more variables.
Suppose we have two continuous variables X and Y and if
the change in X affects Y, the variables are said to be
correlated. In other words, the systematic relationship
between the variables is termed as correlation.
3. • When only two variables are involved the correlation is known
as simple correlation and when more than two variables are
involved the correlation is known as multiple correlation.
• When the variables move in the same direction, these
variables are said to be correlated positively and if they move
in the opposite direction they are said to be negatively
correlated.
• When there are two related variables their joint distribution is
known as bivariate normal distribution and if there are more
than two variables their joint distribution is known as
multivariate normal distribution.
5. Correlation coefficient:
The measures of the degree of relationship between two continuous
variables is called correlation coefficient. It is denoted by r.
The correlation coefficient r is given as the ratio of covariance of the variables
X and Y to the product of the standard deviation of X and Y.
6. Assumptions:
Correlation coefficient r is used under certain
assumptions, they are
1. The variables under study are continuous
random variables and they are normally distributed.
2. The relationship between the variables is linear.
3. Each pair of observations is unconnected with
other pair (independent)
7. Properties:
1. The correlation coefficient value ranges between
–1 and +1.
2. The correlation coefficient is not affected by change of origin or
scale or both.
3. If r > 0 it denotes positive correlation
r< 0 it denotes negative correlation.
r = 0 then the two variables x and y are not linearly
correlated.(i.e)two variables are independent.
r = +1 then the correlation is perfect positive
r = -1 then the correlation is perfect negative.
8. Regression is the functional relationship between two
variables and of the two variables one may represent cause
and the other may represent effect.
The variable representing cause is known as independent
variable and is denoted by X. The variable X is also known as
predictor variable or repressor. The variable representing
effect is known as dependent variable and is denoted by Y. Y
is also known as predicted variable.
REGRESSION
9. The relationship between the dependent and the
independent variable may be expressed as a function
and such functional relationship is termed as
regression.
When there are only two variables the functional
relationship is known as simple regression and if the
relation between the two variables is a straight line is
known a simple linear regression. When there are more
than two variables and one of the variables is
dependent upon others, the functional relationship is
known as multiple regression.
10. The regression line is of the form
y=a+bx
where a : constant or intercept
b : regression coefficient / slope
11. Assumptions:
1. The x’s are non-random or fixed constants.
2. At each fixed value of X the corresponding values of Y
have a normal distribution about a mean.
3. For any given x, the variance of Y is same.
4. The values of y observed at different levels of x are
completely independent
12. Properties of Regression coefficients:
1. The range of regression coefficient is -∞ to +∞
2. Regression coefficients are independent of change of origin but not of
scale.
3. If r=1 angle between two regression line is “zero degree.
If r=0 the regression lines are perpendicular to each other.
4.If variables X and Y are independent then the regression coefficients are
Zero.
5. Also if one regression coefficient is positive the other must be positive
and if one regression coefficient is negative the other must be negative. ie.
if b1>0, then b2>0 and if b1<0, then b2<0.
6.The two regression lines intersect at the point of means of X and Y.