The document discusses covariance and correlation, which are mathematical models used to assess relationships between variables. Covariance measures how two variables change together, while correlation measures both the strength and direction of the linear relationship between variables. Correlation coefficients range from -1 to 1, where values closer to 1 or -1 indicate a strong linear relationship and values closer to 0 indicate no linear relationship. The document also discusses partial correlation and multiple correlation, which measure relationships while controlling for additional variables. Factors that can affect correlation analyses include sample size and outliers.
(ANVI) Koregaon Park Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
Covariance and correlation(Dereje JIMA)
1. FEDERAL TECHNICAL AND VOCATIONAL EDUCATION
AND TRAINING INSTITUTE
DEPARTMENTS OF CONSTRUCTION TECHNOLOGY
AND MANAGEMENT
Program: - MSc Construction Technology and
Management
SECTION-1-
Course Title: - RESEARCH METHODS
COURSE CODE: (COTM-528)
Assignment Area: - COVARIANCE AND CORRELATION
Prepared by DEREJE JIMA
ID/NO: -MTR/110/2012
Submitted To: -TSEGAYE G. (DR)
2. DEREJE JIMA
1
TABLE OF CONTENTS
INTRODUCTION......................................................................................................................... 2
1. MATHEMATICAL MODELS. .............................................................................................. 2
1.1 COVARIANCE AND CORRELATION.............................................................................. 2
1.1.1. COVARIANCE...................................................................................................... 2
1.1.1.1. COVARIANCE FORMULA ......................................................................... 2
1.1.1.2. SAMPLE OF COVARIANCE FORMULAS ................................................ 3
1.1.2. CORRELATION.................................................................................................... 3
1.1.2.1 PRINCIPLE OF CORRELATION AS FOLLOWS......................................... 4
1.1.3 PARTS OF CORRELATION ................................................................................. 6
1.1.3.1 PARTIAL CORRELATION ............................................................................ 6
1.1.3.1.1 IMPORTANCE OF PARTIAL CORRELATION .................................... 6
1.1.3.2. MULTIPLE CORRELATION ........................................................................ 7
1.1.3.2.1 USES OF MULTIPLE CORRELATION.................................................. 7
1.1.3.3. LIMITATIONS ............................................................................................... 8
1.1.3.3.1. THESE TECHNIQUES REQUIRE.......................................................... 8
1.1.4. WHAT CORRELATION COEFFICIENTS DO NOT DO ................................... 8
1.1.5. FACTORS THAT AFFECT A CORRELATION ANALYSIS ............................ 8
1.1.6. HOW TO MEASURE THE COVARIANCE AND CORRELATION OF DATA
SAMPLES........................................................................................................................ 9
1.1.7. RELATIONSHIP BETWEEN TWO ................................................................. 9
1.1.8. DIFFERENCES BETWEEN COVARIANCE AND CORRELATION ............. 10
1.1.9 CONCLUSION..................................................................................................... 11
REFERENCE.............................................................................................................................. 12
3. DEREJE JIMA
2
INTRODUCTION
1. MATHEMATICAL MODELS.
What Is Mathematical Model Means.
➢ A model is a simplified representation of part of the real world.
➢ Models are based on theory. In research models help to test theory by making predictions
that can be compared with observations.
➢ Models also allow the implications of research results to be explored by making predictions
for new situations.
➢ Each model is built for a specific purpose. A model that is useful for one job may be
inappropriate for another task on a similar topic.
➢ Models vary in scope from the simple, which you can put together and use very quickly,
to the complex that may take much of your project time to develop and use.
➢ Computing tools designed for the job can make modelling feasible for students who are
not specialists.
➢ using models all the time.
1.1 COVARIANCE AND CORRELATION
Covariance and Correlation: - both of them can be done by mathematical model methods.
1.1.1. COVARIANCE
➢ Covariance formula is a statistical formula which is used to assess the relationship between
two variables. In simple words, covariance is one of the statistical measurements to know
the relationship of the variance between the two variables.
➢ Covariance signifies the direction of the linear relationship between the two variables. By
direction we mean if the variables are directly proportional or inversely proportional to
each other. (Increasing the value of one variable might have a positive or a negative
impact on the value of the other variable).
➢ The values of covariance can be any number between the two opposite infinities. Also, it’s
important to mention that covariance only measures how two variables change together,
not the dependency of one variable on another one.
➢ The covariance is denoted as COV (X, Y) and the formulas for covariance are given below.
1.1.1.1. COVARIANCE FORMULA
COV (X, y) =
∑(𝑋𝑖−𝑋
̅)(𝑌𝑖−𝑌
̅)
𝑁
1) 𝑋𝑖 Data Value of X
2) 𝑦𝑖 = Data Value of Y
3) 𝑋
̅i = Mean o of X
4) 𝑌𝑖
̅̅̅
̅= Mean of Y
5) N= Number of Data Values.
4. DEREJE JIMA
3
1.1.1.2. SAMPLE OF COVARIANCE FORMULAS.
➢ COV (X, Y) =
∑(𝑋𝑖−𝑋
̅)(𝑌𝑖−𝑌
̅)
𝑁−1
1) Xi= Data Value of X
2) Yi = Data Value of Y
3) 𝑋
̅i = Mean o of X
4) 𝑌𝑖
̅̅̅
̅= Mean of Y
5) N= Number of Data Values.
➢ COV(X,Y) =
1
𝑛
∑ (𝑥 − 𝑥̅)(𝑦 − 𝑦
̅)
𝑛
𝑖=1
o Cov(x,y) ==== covariance between X and Y variables.
o X and Y ====- members of X and YVariables.
o 𝑋
̅ and 𝑌
̅ ==== Mean of X & Y variables.
o N===== number of members.
➢ Now, we`ll look at two observations based on the above formula of covariance:
COV(X, Y) = Positive, if as X increases, Y also increases
COV(X, Y) = Negative, if as X increases, Y decreases
Let's see the above observations with the help of graphs.
The covariance of X and Y is the number defined by
COV (X, Y) = E ((X - µX) (Y - µY)).
➢ Covariance is interesting because it is a quantitative measurement of the relationship between
two variables.
1.1.2. CORRELATION
➢ Correlation analysis is a statistical method used to evaluate the strength of relationship between
two quantitative variables.
➢ Correlation analysis is a method of statistical evaluation used to study the strength of a
relationship between two, numerically measured, continuous variables.
➢ A high correlation means that two or more variables have a strong relationship with each other,
while a weak correlation means that the variables are hardly related
➢ is a term used to denote the association or relationship between two (or more) quantitative
variables?
➢ . Correlation is described as the analysis which lets us know the association or the absence of
the relationship between two variables ‘x’ and ‘y’.
➢ Mathematically, the strength and direction of a linear relationship between two variables is
represented by the correlation coefficient. Suppose that there is n ordered pairs (x, y) that make
up a sample from a population. The correlation coefficient r is given by:
➢ r =
𝑛 ∑(𝑥𝑦)−(∑ 𝑥)(∑ 𝑦)
√𝑛 ∑ 𝑥2−(∑ 𝑥)2√𝑛 ∑ 𝑦2−(∑ 𝑦)2
➢ This will always be a number between -1 and 1 (inclusive).
5. DEREJE JIMA
4
1.1.2.1 PRINCIPLE OF CORRELATION AS FOLLOWS
➢ If r is close to 1, we say that the variables are positively correlated. This means there is
likely a strong linear relationship between the two variables, with a positive slope. Positive
r values indicate a positive correlation, where the values of bothvariables tend to increase
together.
➢ If r is close to -1, we say that the variables are negatively correlated. This means there is
likely a strong linear relationship between the two variables, with a negative slope. where
the values of one variable tend to increase when the values of the other variable decrease.
➢ If r is close to 0, we say that the variables are not correlated. This means that there is likely
no linear relationship between the two variables, however, the variables may still be related
in some other way.
➢ This analysis is fundamentally based on the assumption of a straight line [linear]
relationship between the quantitative variables.
➢ Correlation is a statistical measure that expresses the extent to which two variables are
linearly related (meaning they change together at a constant rate). It’s a common tool for
describing simple relationships without making a statement about cause and effect.
➢ The sample correlation coefficient, r, quantifies the strength of the relationship.
Correlations are also tested for statistical significance.
➢ Correlations are useful for describing simple relationships among data.
➢ We describe correlations with a unit-free measure called the correlation coefficient
which ranges from -1 to +1 and is denoted by r. Statistical significance is indicated with a
p value. Therefore, correlations are typically written with two key numbers: r = and p =
➢ The p-value gives us evidence that we can meaningfully conclude that the population
correlation coefficient is likely different from zero, based on what we observe from the
sample.
➢ A perfect positive correlation has a value of 1, and a perfect negative correlation has a
value of -1. The main result of a correlation is called the correlation coefficient.
6. DEREJE JIMA
5
The correlation of X and Y is the number defined by
ρ XY=𝑥 =
Cov(𝑋,𝑌 )
𝜎X𝜎Y
a. If COV (X, Y) > 0then X and Y are positively correlated.
b. If COV (X, Y) < 0 then X and Y are negatively correlated.
c. If COV (X, Y) = 0 then X and Y are uncorrelated.
The value ρX,Y is also called the correlation coefficient.
Correlation measures linearity between X and Y.
If ρ(X; Y) = 0 we say that X and Y are “uncorrelated.” If two variables are independent, then their
correlation will be 0.
ρ(X; Y) = 1
ρ(X;Y) = -1
ρ(X;Y) = 0
Y = aX + b where a = σy=σx
Y = aX + b where a = -σy=σx
absence of linear relationship
7. DEREJE JIMA
6
➢ If when two variables are increasing the same time it is positive covariance and positive
correlation.
Positive covariance
Positive correlation
➢ If when two variables are decreasing the same time it is negative covariance and negative
correlation.
Negative covariance
Negative correlation
➢ If when variables 1 are decreasing and the variable 2 are increasing same time it is no covariance
and no correlation.
A correlation coefficient is that single value or number which establishes a relationship between
the two variables being studied.
1.1.3 PARTS OF CORRELATION
1.1.3.1 PARTIAL CORRELATION
➢ Partial correlation measures the degree of association between two random variables, with the
effect of a set of controlling random variables removed.
➢ Partial correlation measures the correlation between X and Y, controlling for Z.
ryz.x =
𝐫𝒚𝒙−(𝐫𝒚𝒛)(𝐫𝒙𝒛)
√𝟏−𝒓𝟐𝒚𝒛 √𝟏−𝒓𝟐𝒙𝒛
1.1.3.1.1 IMPORTANCE OF PARTIAL CORRELATION
➢ Partial correlation is the measure of association between two variables, while controlling
or adjusting the effect of one or more additional variables.
➢ Partial correlations can be used in many cases that assess for relationship, like whether or
not the sale value of a particular commodity is related to the expenditure on advertising
when the effect of price is controlled.
➢ Calculate the partial (first-order) correlation between husbands’ housework (Y) and
number of children (X), controlling for husbands’ years of education (Z).
➢ Solution
r =
𝒏(∑ 𝒙𝒚)−(∑ 𝒙)(∑ 𝒚)
√(𝒏 ∑ 𝒙𝟐−(∑ 𝒙)𝟐) ∑ 𝒚𝟐−(∑ 𝒚)𝟐)
➢ By using this formula have to find rxy, ryz, rxz
V
1
V
2
V
1
V
2
V
2
V
1
8. DEREJE JIMA
7
1) rxy =
𝒏(∑ 𝒙𝒚)−(∑ 𝒙)(∑ 𝒚)
√(𝒏 ∑ 𝒙𝟐−(∑ 𝒙)𝟐) ∑ 𝒚𝟐−(∑ 𝒚)𝟐)
2) ryz =
𝒏(∑ 𝒚𝒛)−(∑ 𝒙𝒚)(∑ 𝒚𝒛)
√(𝒏 ∑ 𝒚𝟐−(∑ 𝒚)𝟐) ∑ 𝒛𝟐−(∑ 𝒛)𝟐)
3) rxz =
𝑛(∑ 𝑥𝑧)−(∑ 𝑥)(∑ 𝑧)
√(𝑛 ∑ 𝑥2−(∑ 𝑥)2) ∑ 𝑧2−(∑ 𝑧)2)
1.1.3.2. MULTIPLE CORRELATION
➢ The coefficient of multiple determination (R2) measures how much of Y is explained by
all of the X’s combined
➢ R2 measures the percentage of the variation in Y that is explained by all of the
independent variables combined
➢ The coefficient of multiple determination is an indicator of the strength of the entire
regression equation
R2
= r2
y1 + r2
y2.1(1- r2
y1)
1) R2
= coefficient of multiple determination.
2) r2
y1 = zero- order correlation between Y and X1.
3) r2
y2.1 = partial correlation of Y and X2, while controlling for X1.
HERE,
➢ r2
y1 = r2
yx
➢ r2
y2.1 = r2
yz.x
1.1.3.2.1 USES OF MULTIPLE CORRELATION
➢ In statistics, the coefficient of multiple correlation is a measure of how well a given variable
can be predicted using a linear function of a set of other variables.
➢ It is the correlation between the variable's values and the best predictions that can be computed
linearly from the predictive variables.
➢ Before estimating R2
, we need to estimate the partial correlation of Y and X2 (ryz.x) as
follows
➢ ry2.1=
𝑟𝑦2−(𝑟𝑦1)(𝑟12)
√1−𝑟2𝑦1√1−𝑟212
➢ 𝒓𝒚𝟐 = 𝒓𝒚𝒛 =
𝒏(∑ 𝒚𝒛)−(∑ 𝒙𝒚)(∑ 𝒚𝒛)
√(𝒏 ∑ 𝒚𝟐−(∑ 𝒚)𝟐) ∑ 𝒛𝟐−(∑ 𝒛)𝟐)
➢ 𝐫𝐲𝟏 = 𝐫𝐱𝐲 =
𝒏(∑ 𝒙𝒚)−(∑ 𝒙)(∑ 𝒚)
√(𝒏 ∑ 𝒙𝟐−(∑ 𝒙)𝟐) ∑ 𝒚𝟐−(∑ 𝒚)𝟐)
9. DEREJE JIMA
8
➢ 𝐫𝟏𝟐 = 𝐫𝐱𝐳 =
𝑛(∑ 𝑥𝑧)−(∑ 𝑥)(∑ 𝑧)
√(𝑛 ∑ 𝑥2−(∑ 𝑥)2) ∑ 𝑧2−(∑ 𝑧)2)
➢ ry2.1= 𝒓𝒚𝒛.𝒙 =
𝑟𝑦2−(𝑟𝑦1)(𝑟12)
√1−𝑟2𝑦1√1−𝑟212
1.1.3.3. LIMITATIONS
➢ Multiple correlation and correlation are among the most powerful techniques available to
researchers. but powerful techniques have high demands
1.1.3.3.1. THESE TECHNIQUES REQUIRE
➢ Every variable is measured at the interval-ratio level.
➢ Each independent variable has a linear relationship with the dependent variable.
➢ Independent variables do not interact with each other
➢ Independent variables are uncorrelated with each other
➢ When these requirements are violated (as they often are), these techniques will produce biased
and/or inefficient estimates
➢ There are more advanced techniques available to researchers that can correct for violations of
these requirements
1.1.4. WHAT CORRELATION COEFFICIENTS DO NOT DO
➢ Correlation coefficients do not give information about whether one variable moves in
response to another.
➢ There is no attempt to establish one variable as "dependent" and the other as "independent".
➢ We shall discuss the concept of independent and dependent variables in the next article on
regression analysis.
➢ Relationships identified using correlation coefficients should be interpreted for what they
are: associations, and not causal relationships.
1.1.5. FACTORS THAT AFFECT A CORRELATION ANALYSIS
➢ Correlation analysis should not be used when data is repeated measures of the same
variable from the same individual at the same or varied time points.
➢ It is useful to draw a scatter plot as an important pre-requisite to any correlation analysis
as it helps eyeball the data for outliers and non-linear relationships
➢ An outlier is essentially an infrequently occurring value in the data set. It is important to
remember that even a single outlier can dramatically alter the correlation coefficient.
➢ If there is a non-linear relationship between the quantitative variables, correlation analysis
should not be performed.
➢ If the dataset has two distinct subgroups of individuals whose values for one or both
variables differ considerably from each other, a false correlation may be found, when none may
exist.
➢ If one data set forms part of the second data set, we would expect to find a positive
correlation between them because the second quantity "contains" the first quantity.
10. DEREJE JIMA
9
1.1.6. HOW TO MEASURE THE COVARIANCE AND CORRELATION OF DATA
SAMPLES
➢ Sample covariance measures the strength and the direction of the relationship between the
elements. Of two samples, and the sample correlation is derived from the covariance. The sample
covariance between two variables, X and Y, is
➢ COV(X,Y)=
∑(𝑿𝒊−𝑿
̅)(𝒀𝒊−𝒀
̅)
𝑵−𝟏
Here’s what each element in this equation means:
➢ COV(X, Y) = the sample covariance between variables X and Y (the two subscripts
indicate that this is the sample covariance, not the sample standard deviation).
➢ X
̅(X bar) = the sample of mean for “X”
➢ Y
̅(Y bar) = the sample of mean for “Y”
➢ n = the number of elements in both samples.
➢ i = an index that assigns a number to each sample element, ranging from 1 to n.
➢ Xi = a single element in the sample for X.
➢ Yi = a single element in the sample for Y
1.1.7. RELATIONSHIP BETWEEN TWO
➢ random variables, Independence or nonindependence. but if there is a relationship, the
relationship may be strong or weak.
➢ In this section, we discuss two numerical measures of the strength of a relationship between
two random variables, the covariance and correlation.
➢ Throughout this section, we will use the notation EX = µX, EY = µY , Var X = σX2
, and
VarY = σ2
Y σ=sigma
➢ Covariance determines the type of interaction between two variables, while correlation
determines the direction as well as the strength of the relationship between two variables.
11. DEREJE JIMA
10
1.1.8. DIFFERENCES BETWEEN COVARIANCE AND CORRELATION
COVARIANCE CORRELATION
Covariance is a measure to indicate the extent to which
two random variables change in tandem.
Correlation is a measure used to represent how strongly two random
variables are related to each other.
Covariance is nothing but a measure of
correlation.
Correlation refers to the scaled form of covariance.
Covariance indicates the direction of the linear
relationship between variables.
Correlation on the other hand measures both the strength and direction of
the linear relationship between two variables.
Covariance can vary between -∞ and +∞ Correlation ranges between -1 and+1
Covariance is affected by the change in scale. If all the
values of one variable are multiplied by a constant and
all the values of another variable are multiplied, by a
similar or different constant, then the covariance is
changed.
Correlation is not influenced by the change in scale.
Covariance assumes the units from the product of the
units of the two variables.
Correlation is dimensionless, i.e. It’s a unit-free measure of the relationship
between variables.
Covariance of two dependent variables measures how
much in real quantity (i.e. cm, kg, liters) on average they
co-vary.
Correlation of two dependent variables measures the proportion of how
much on average these variables vary w.r.t one another.
Covariance is zero in case of independent variables (if
one variable moves and the other
Independent movements do not contribute to the total correlation
12. DEREJE JIMA
11
1.1.9 CONCLUSION
In summary, correlation coefficients are used to assess the strength and direction of the linear
relationships between pairs of continuous variables. Correlation analysis is seldom used alone,
correlation analysis stops with the calculation of the correlation coefficient and perhaps a test of
significance. Both Correlation and Covariance are very closely related to each other and yet they
differ a lot. the formulas of covariance and correlation as follows.
Covariance formals:- COV (X, y) =
∑(𝑋𝑖−𝑋
̅)(𝑌𝑖−𝑌
̅)
𝑁
and other formals for sample of covariance and
COV(X,Y)=
∑(𝑋𝑖−𝑋
̅)(𝑌𝑖−𝑌
̅)
𝑁−1
.Correlation formulas:- r =
𝑛 ∑(𝑥𝑦)−(∑ 𝑥)(∑ 𝑦)
√𝑛 ∑ 𝑥2−(∑ 𝑥)2√𝑛 ∑ 𝑦2−(∑ 𝑦)2
.Covariance can
be based up on the value of gain number positive or negative but based on the correlation the
ranges of between -1 to 1.
13. DEREJE JIMA
12
REFERENCE
1) Francis W. parker, principle of correlation vol.1. No. 6
2) https://www.dummies.com/education/math/business-statistics/how-to-measure-the-
covariance-and-correlation-of-data-samples/.
3)