Upcoming SlideShare
×

# Stats

3,261 views
3,140 views

Published on

Published in: Technology
3 Likes
Statistics
Notes
• Full Name
Comment goes here.

Are you sure you want to Yes No
• Be the first to comment

Views
Total views
3,261
On SlideShare
0
From Embeds
0
Number of Embeds
3
Actions
Shares
0
65
0
Likes
3
Embeds 0
No embeds

No notes for slide

### Stats

1. 1. CORRELATION ANALYSIS Concept and Importance of Correlation We may come across certain series wherein there may be more than one variable. A distribution in which each variable assumes two values is called a Bivariate Distribution. If we measure more than two variables on each unit of a distribution, it is called Multivariate Distribution. In a bivariate distribution, we may be interested to find if there is any relationship between the two variables under study. The Correlation is a statistical tool which studies the relationship between two variables and the correlation analysis involves various methods and techniques used for studying and measuring the extent of the relationship between the two variables. Correlation analysis is used as a statistical tool to ascertain the association between two variables. “When the relationship is of a quantitative nature, the appropriate statistical tool for discovering & measuring the relationship and expressing it in a brief formula is known as correlation.” - Croxton & Cowden “Correlation is an analysis of the covariation between two or more variables.” - A. M. Tuttle “Correlation Analysis contributes to the understanding of economic behaviour, aids in locating the critically important variables on which others depend, may reveal to the economist the connections by which disturbances spread and suggest to him the paths through which stabilizing forces may become effective.” - W. A. Neiswanger “The effect of correlation is to relation is to reduce the range of uncertainty of our prediction.” - Tippett The problem in analyzing the association between two variables can be broken down into three steps. o We try to know whether the two variables are related or independent of each other.
2. 2. o If we find that there is a relationship between the two variables, we try to know its nature and strength. This means whether these variables have a positive or a negative relationship and how close that relationship is. o We may like to know if there is a causal relationship between them. This means that the variation in one variable causes variation in another. When data regarding two or more variables are available, we may study the related variation of these variables. For e.g. in a data regarding heights (x) and weights (y) of students of a college, we find that those students who have greater height would have greater weight. Also, students who have lesser height would have lesser weight. This type of related variation among variables is called correlation. Correlation may be (i) Simple correlation (ii) Multiple correlation (iii) Partial correlation. Simple correlation concerns with related variation among two variables. Multiple correlation and partial correlation concern with related variation among three or more variables. Two variables are said to be correlated when they vary such that a. The higher values of one variable correspond to the higher values of the other and the lower values of the variable correspond to the lower values of the other. or b. The higher values of one variable correspond to the lower values of the other. Generally, it can be seen that those who are tall will have greater weight, and those who are short will have lesser weight. Thus height (x) and weight (y) of persons show related variation. And so they are correlated. On the other hand production (x) and price (y) of vegetables show variation in opposite directions. Here the higher the production the lower would be the price. In both the above examples, the variables x and y show related variation. And so they are correlated. TYPES OF CORRELATION Correlation is positive (direct) if the variables vary in the same directions, that is, if they increase and decrease together.
3. 3. Height (x) and weight (y) of persons are positively correlated. Correlation is negative (inverse) if the variables vary in the opposite directions, that is, if one variable increases the other variable decreases. Production (x) and price (y) of vegetables are negatively correlated. If variables do not show related variation, they are said to be non – correlated. If variables show exact linear relationship, they are said to be perfectly correlated. Perfect correlation may be positive or negative. Correlation and Causation o The correlation may be due to chance particularly when the data pertain to a small sample. o It is possible that both the variables are influenced by one or more other variables. o There may be another situation where both the variables may be influencing each other so that we cannot say which is the cause and which is the effect. Types of Correlation o Positive and Negative: If the values of the two variables deviate in the same direction i.e., if the increase in the values of one variable results, on an average, in a corresponding increase in the values of the other variable or if a decrease in the values of one variable results, on an average, in a corresponding decrease in the values of the other variable, correlation is said to be positive or direct. For example: Price & Supply of the commodity. On the other hand, correlation is said to be negative or inverse if the variables deviate in the opposite direction i.e., if the increase (decrease) in the values of one variable results, on the average, in a corresponding decrease (increase) in the values of the other variable. For example: Temperature and Sale of Woolen Garments. o Linear and Non-Linear: The correlation between two variables is said to be linear if corresponding to a unit change in one variable, there is a constant change in the other variable over the entire range of the values. For example: y = ax + b. The relationship between two variables is said to be non-linear or curvilinear if corresponding to a unit
4. 4. change in one variable, the other variable does not change at a constant rate but at a fluctuating rate. When this is plotted in the graph this will not be a straight line. o Simple, Partial and Multiple: The distinction amongst these three types of correlation depends upon the number of variables involved in a study. If only two variables are involved in a study, then the correlation is said to be simple correlation. When three or more variables are involved in a study, then it is a problem of either partial or multiple correlation. In multiple correlation, three or more variables are studied simultaneously. But in partial correlation we consider only two variables influencing each other while the effect of other variable is held constant. For example: Let us suppose that we have three variables, number of hours studied (x); IQ (y); marks obtained (z). In a multiple correlation we will study the correlation between z with 2 variables x & y. In contrast, when we study the relationship between x & z, keeping an average IQ as constant, it is said to be a study involving partial correlation. Methods of Correlation METHODS OF CORRELATION GRAPHIC ALGEBRAIC SCATTER DIAGRAM COVARIENCE RANK CONCURRENT METHOD CORRELATION DEVIATION METHOD Process of Calculating Coefficient of Correlation o Calculate the means of the two series: X and Y. o Take deviations in the two series from their respective means, indicated as x and y. The deviation should be taken in each case as the value of the individual item minus (–) the arithmetic mean. o Square the deviations in both the series and obtain the sum of the deviation-squared columns. This would give ∑x2 and ∑y2.
5. 5. o Take the product of the deviations, that is, ∑xy. This means individual deviations are to be multiplied by the corresponding deviations in the other series and then their sum is obtained. o The values thus obtained in the preceding steps ∑xy, ∑x2 and ∑y2 are to be used in the formula for correlation. SCATTER DIAGRAM METHOD Scatter diagram is a graphic presentation of bivariate data. Here, bivariate data with n pairs of values is represented by n points on the xy – plane. The two variables are taken along the two axes, and every pair of values in the data is represented by a point on the graph. The pattern of distribution of points on the graph can be made use of for the rough estimation of degree of correlation between the variables. In the scatter diagram – a. If the points form a line with positive sloe (a line moving upwards), the variables are positively and perfectly correlated. b. If the points form a line with negative slope (a line moving downwards), the variables are negatively and perfectly correlated. c. If the points cluster around a line with positive slope the variables are positively correlated. d. If the points cluster around a line with negative slope, the variables are negatively correlated. e. If the points are spread all over the graph, the variables are non correlated. f. Any other curve – form of spread of points indicates curvilinear relation between the variables. Scatter diagram is one of the simplest ways of diagrammatic representation of a bivariate distribution and provides us one of the simplest tools of ascertaining the correlation between two variables. Suppose we are given n pairs of values of two variables X and Y. For example, if the variables X and Y denote the height and weight respectively, then the pairs my represent the
6. 6. heights and weights (in pairs) of n individuals. These n points may be plotted as dots (.) on the x – axis and y – axis in the xy – plane. (It is customary to take the dependent variable along the x – axis.) the diagram of dots so obtained is known as scatter diagram. From the scatter diagram we can form a fairly good, though rough idea about the relationship between the two variables. The following points may be borne in mind in interpreting the scatter diagram regarding the correlation between the two variables: 1. If the points are very dense i.e very close to each other, a fairly good amount of correlation may be expected between the two variables. On the other hand, if the points are widely scattered, a poor correlation may be expected between them. 2. If the points on the scatter diagram reveal any trend (either upward or downward), the variables are said to be correlated and if no trend is revealed, the variables are uncorrelated. 3. If there is an upward trend rising from lower left hand corner and going upward to the upper right hand corner , the correlation is positive since this reveals that the values of the two variables are move in the same direction. If, on the other hand the points depict a downward trend from the upper left hand corner, the correlation is negative since in this case the values of the two variables move in the opposite directions. 4. In particular , if all the points lie on a straight line starting from the left bottom and going up towards the right top, the correlation is perfect and positive , and if all the points lie on a straight line starting from the left top and coming down to right bottom , the correlation is perfect and negative. 5. The method of scatter diagram is readily comprehensible and enables us to form a rough idea of the nature of the relationship between the two variables merely by inspection of the graph. Moreover, this method is not affected by extreme observation whereas all mathematical formulae of ascertaining correlation between two variables are affected by extreme observations. However, this method is not suitable if the number of observations is fairly large. 6. The method of scatter diagram tells us about the nature of the relationship whether it is positive or negative and whether it is high or low. It does not provide us exact measure of the extent of the relationship between the two variables.
7. 7. 7. The scatter diagram enables us to obtain an approximate estimating line or line of best fit by free hand method. The method generally consists in stretching a piece of thread through the plotted points to locate the best possible line. KARL PEARSON’S COEFFICIENT OF CORRELATION (COVARIENCE METHOD; PRODUCT MOMENT) This is a measure of linear relationship between the two variables. It indicates the degree of correlation between the two variables. It is denoted by ‘r’. INTERPRETATION OF COEFFICIENT OF CORRELATION a. A positive value of r indicates positive correlation b. A negative value of r indicates negative correlation c. r = +1 means, correlation is perfect positive. d. r = -1 means, correlation is perfect negative. e. r = 0 (or low) means, the variables are non – correlated. Karl Pearson’s measure known as Pearsonian correlation co efficient between two variables ( series) X and Y , usually donated by r , is a numerical measure of linear relationship between them and is defined as the ratio of the covariance between X and Y , written as Cov (x, y) to the product of standard deviation of X and Y . Assumptions of the Karl Pearson’s Correlation o The two variables X and Y are linearly related. o The two variables are affected by several causes, which are independent, so as to form a normal distribution. Coefficient of Determination The strength of r is judged by coefficient of determination, r2 for r = 0.9, r2 = 0.81. We multiply it by 100, thus getting 81 per cent. This suggests that when r is 0.9 then we can say that 81 per cent of the total variation in the Y series can be attributed to the relationship with X.
8. 8. Rank Correlation Limitations of Spearman’s Method of Correlation o Spearman’s r is a distribution-free or non parametric measure of correlation. o As such, the result may not be as dependable as in the case of ordinary correlation where the distribution is known. o Another limitation of rank correlation is that it cannot be applied to a grouped frequency distribution. o When the number of observations is quite large and one has to assign ranks to the observations in the two series, then such an exercise becomes rather tedious and time- consuming. This becomes a major limitation of rank correlation. Some Limitations of Correlation Analysis o Correlation analysis cannot determine cause-and-effect relationship. o Another mistake that occurs frequently is on account of misinterpretation of the coefficient of correlation and the coefficient of determination. o Another mistake in the interpretation of the coefficient of correlation occurs when one concludes a positive or negative relationship even though the two variables are actually unrelated.