SlideShare a Scribd company logo
1 of 70
Download to read offline
MAL1303: Statistical Hydrology
Correlation
Dr. Shamsuddin Shahid
Department of Hydraulics and Hydrology
Faculty of Civil Engineering, Universiti Teknologi Malaysia
Room No. M46-332; E-mail: sshahid@utm.my
Mobile: 0182051586
11/23/2015 Shamsuddin Shahid, FKA, UTM
You created this PDF from an application that is not licensed to print to novaPDF printer (http://www.novapdf.com)
Research Questions: Are two variables related?
Example questions in hydrology:
– “Is there any relation between rainfall and river
discharge?”
– “Is there any relation between low river flow and river
water quality?”
– “Is there any relation between elevation and rainfall?”
– “Is there any relation between rainfall intensity and
landslides?
Test the relationship: Correlation
11/23/2015 Shamsuddin Shahid, FKA, UTM
You created this PDF from an application that is not licensed to print to novaPDF printer (http://www.novapdf.com)
Correlation
Definition: Correlation is a statistical method that is used to
examine the extent to which two variables have a simple linear
relationship.
Questions:
 What does it mean to say that two variables are associated with
one another?
 How can we mathematically formalize the concept of
association?
Answer:
Correlation
11/23/2015 Shamsuddin Shahid, FKA, UTM
You created this PDF from an application that is not licensed to print to novaPDF printer (http://www.novapdf.com)
Correlation gives relationship between two variables:
– direction
– Strength
– Significance
Sign indicates direction
Size indicates strength
Comparison with critical values gives significance
Correlation
11/23/2015 Shamsuddin Shahid, FKA, UTM
You created this PDF from an application that is not licensed to print to novaPDF printer (http://www.novapdf.com)
Scatter Plots
• Plot each pair of observations (X, Y)
• x = predictor variable (independent)
• y = criterion variable (dependent)
• Check for:
– outliers
– linearity
11/23/2015 Shamsuddin Shahid, FKA, UTM
You created this PDF from an application that is not licensed to print to novaPDF printer (http://www.novapdf.com)
How do you study the relationship between two variables?
Groundwater temperature data are collected at different depth from the earth
surface.
A list of these data is difficult to understand.
The relationship between the two variables can be visualized using a scatter
diagram, where each pair depth-temperature is represented as a point in a
plane.
11/23/2015 Shamsuddin Shahid, FKA, UTM
You created this PDF from an application that is not licensed to print to novaPDF printer (http://www.novapdf.com)
Types of Correlation
Correlation
Positive Correlation Negative Correlation
Positive Correlation: The correlation is said to be positive correlation if
the values of two variables changing with same direction.
Negative Correlation: The correlation is said to be negative correlation
when the values of variables change with opposite direction.
Type I
11/23/2015 Shamsuddin Shahid, FKA, UTM
You created this PDF from an application that is not licensed to print to novaPDF printer (http://www.novapdf.com)
Positive & Negative Association
At each depth two data are collected: Temperature and Nitrogen Concentration.
We obtained two scatter plot:
(i) Depth vs. Groundwater Temperature;
(ii) Depth vs. Nitrogen Concentration in Groundwater.
In the first graph, it is observed that temperature is increasing with depth, as a
general tendency. This corresponds to a positive association.
In the second graph, Nitrogen concentration decreasing with depth. This
corresponds to a negative association.
11/23/2015 Shamsuddin Shahid, FKA, UTM
You created this PDF from an application that is not licensed to print to novaPDF printer (http://www.novapdf.com)
Types of Correlation
Correlation
Simple Multiple
Partial Total
Type II
11/23/2015 Shamsuddin Shahid, FKA, UTM
You created this PDF from an application that is not licensed to print to novaPDF printer (http://www.novapdf.com)
Types of Correlation Type II
• Simple correlation: Under simple correlation problem there
are only two variables are studied.
• Multiple Correlation: Under Multiple Correlation three or
more than three variables are studied.
• Partial correlation: analysis recognizes more than two
variables but considers only two variables keeping the other
constant.
• Total correlation: is based on all the relevant variables, which
is normally not feasible.
11/23/2015 Shamsuddin Shahid, FKA, UTM
You created this PDF from an application that is not licensed to print to novaPDF printer (http://www.novapdf.com)
Types of Correlation
Correlation
LINEAR NON LINEAR
Type III
11/23/2015 Shamsuddin Shahid, FKA, UTM
You created this PDF from an application that is not licensed to print to novaPDF printer (http://www.novapdf.com)
Types of Correlation Type III
• Linear correlation: Correlation is said to be linear when the amount of
change in one variable tends to bear a constant ratio to the amount of
change in the other. The graph of the variables having a linear relationship
will form a straight line.
• Non Linear correlation: The correlation would be non linear if the amount of
change in one variable does not bear a constant ratio to the amount of
change in the other variable.
11/23/2015 Shamsuddin Shahid, FKA, UTM
You created this PDF from an application that is not licensed to print to novaPDF printer (http://www.novapdf.com)
Correlation Coefficient
 The correlation coefficient gives a measure of the linear association
of two variables. It defines the degree of relationship.
 The correlation coefficient is usually denoted by r and takes values
between -1 and 1.
r is positive; between 0 and 1 r is negative; between 0 and -1
11/23/2015 Shamsuddin Shahid, FKA, UTM
You created this PDF from an application that is not licensed to print to novaPDF printer (http://www.novapdf.com)
Correlation Coefficient
 Nitrogen concentration Data are collected at two different locations and
obtained two plots given below. Both show negative correlation between depth
and Nitrogen concentration. Correlation coefficient, r will be more negative in
case of first plot compared to second plot.
 If the scatter plot of the two variables is very close to the straight line we have a
correlation that is close to one. A near zero correlation corresponds to a diagram
where the data are widely scattered around the line.
11/23/2015 Shamsuddin Shahid, FKA, UTM
You created this PDF from an application that is not licensed to print to novaPDF printer (http://www.novapdf.com)
11/23/2015 Shamsuddin Shahid, FKA, UTM
You created this PDF from an application that is not licensed to print to novaPDF printer (http://www.novapdf.com)
Correlation Coefficient - Summary
 A positive coefficient means that the data are clustered around lines with a
positive slope. That is, as one variable increases, the other one also
increases.
 A negative coefficient means that the data are clustered around lines with a
negative slope. That is, as one variable increases, the other one decreases.
 The closer r is to 1 the stronger the positive linear association between the
variables.
 The closer r is to -1 the stronger the negative linear association between the
variables.
 When r is equal to or near to 1 or -1 there is a linear association between
the variables.
 When r is equal to or near to 0, there no association between the variables.
11/23/2015 Shamsuddin Shahid, FKA, UTM
You created this PDF from an application that is not licensed to print to novaPDF printer (http://www.novapdf.com)
Pearson Correlation
 Pearson correlation is used to describe relationship between
two variables that are both interval and ration variables.
 Pearson correlation compares how consistently each Y value is
paired with each X value in a linear fashion
11/23/2015 Shamsuddin Shahid, FKA, UTM
You created this PDF from an application that is not licensed to print to novaPDF printer (http://www.novapdf.com)
Covariance
• covariance is a measure of how much two variables change together.
• Variance shared by 2 variables
• Covariance reflects the direction of the relationship:
 Positive covariance indicates + relationship
 Negative covariance indicates - relationship
11/23/2015 Shamsuddin Shahid, FKA, UTM
You created this PDF from an application that is not licensed to print to novaPDF printer (http://www.novapdf.com)
Computational Formula
Sum of Squares (SS) measures the amount of variation or variability of
a single variable.
Sum of Products (SP) provides a parallel procedure for measuring the
amount of covariation or covariability between two variables.
11/23/2015 Shamsuddin Shahid, FKA, UTM
You created this PDF from an application that is not licensed to print to novaPDF printer (http://www.novapdf.com)
Calculation of Pearson’s Correlation Coefficient
 Pearson’s correlation coefficient is a ratio comparing the
covariability of X and Y with variability of X and Y separately.
 SP measures the covariability of X and Y
 The variability of X and Y is measured by calculating the SS for X
and Y scores separately
11/23/2015 Shamsuddin Shahid, FKA, UTM
You created this PDF from an application that is not licensed to print to novaPDF printer (http://www.novapdf.com)
Calculation of Pearson’s Correlation Coefficient
Let, X represent Depth in feet and Y represent Nitrate Concentration in
mg/l. The association between Groundwater Depth and Nitrate
Concentration can be found as below:
11/23/2015 Shamsuddin Shahid, FKA, UTM
You created this PDF from an application that is not licensed to print to novaPDF printer (http://www.novapdf.com)
Hypothesis Testing
 H0 : there is no correlation between depth and nitrate concentration or the
population correlation is 0.
 H1: there is a real non-zero correlation in the population.
 Population correlation is traditionally represented by , therefore, with
symbol we can write,
H0 :  = 0
H1:  ≠ 0
 For the pearson’s correlation, Degree of Freedom df = n-2. Where n is the
sample size. We lose 2 degree of freedoms because we need to estimate two
means, one for each variance estimate.
 If the calculated r is equal to or exceeds the critical value (given in Table) then
obtained r is significant.
11/23/2015 Shamsuddin Shahid, FKA, UTM
You created this PDF from an application that is not licensed to print to novaPDF printer (http://www.novapdf.com)
Hypothesis Testing
In the present case, r = 0.875
df = n-2
= 5-2
= 3
Critical value for α = 0.05, df = 3 is 0.878.
Therefore, we accept H0 :  = 0
There is no correlation between the populations
11/23/2015 Shamsuddin Shahid, FKA, UTM
You created this PDF from an application that is not licensed to print to novaPDF printer (http://www.novapdf.com)
Significance of Correlation
Df Critical Value
(N-2) p = .05
5 .67
10 .50
15 .41
20 .36
25 .32
30 .30
50 .23
200 .11
500 .07
1000 .05
11/23/2015 Shamsuddin Shahid, FKA, UTM
You created this PDF from an application that is not licensed to print to novaPDF printer (http://www.novapdf.com)
Correlation: r & r2
 As a matter of routine it is the squared correlations
that should be interpreted. This is because the
correlation coefficient is misleading in suggesting
the existence of more covariation than exists, and
this problem gets worse as the correlation
approaches zero.
 Note that as the correlation r decrease by tenths,
the r2 decreases by much more. A correlation of .50
only shows that 25 percent variance is in common;
a correlation of .20 shows 4 percent in common;
and a correlation of .10 shows 1 percent in common
(or 99 percent not in common).
 Thus, squaring should be a healthy corrective to the
tendency to consider low correlations, such as .20
and .30, as indicating a meaningful or practical
covariation.
11/23/2015 Shamsuddin Shahid, FKA, UTM
You created this PDF from an application that is not licensed to print to novaPDF printer (http://www.novapdf.com)
Assumptions
• Scale of measurement is interval
• Linear relationships
• Homoscedasticity
• Similar normal underlying distributions
• No outliers
11/23/2015 Shamsuddin Shahid, FKA, UTM
You created this PDF from an application that is not licensed to print to novaPDF printer (http://www.novapdf.com)
Homoscedasticity
11/23/2015 Shamsuddin Shahid, FKA, UTM
You created this PDF from an application that is not licensed to print to novaPDF printer (http://www.novapdf.com)
Advantages and Disadvanateges of Pearson’s Coefficient
Advantages
• It summarizes in one value, the degree of correlation &
direction of correlation also.
Limitations
• Always assume linear relationship
• Interpreting the value of r is difficult.
• Value of Correlation Coefficient is affected by the extreme
values.
11/23/2015 Shamsuddin Shahid, FKA, UTM
You created this PDF from an application that is not licensed to print to novaPDF printer (http://www.novapdf.com)
Parametric and Non-parametric Correlation
Parametric correlation:
when distribution of data is normal.
Example: Pearson Correlation
Non-parametric correlation:
when distribution of data is not normal
Example: Spearman’s Rank Correlation, Kendall- Correlation
11/23/2015 Shamsuddin Shahid, FKA, UTM
You created this PDF from an application that is not licensed to print to novaPDF printer (http://www.novapdf.com)
The Spearman Correlation
 Spearman’s correlation is designed to measure the relationship between
variables measured on an ordinal scale of measurement
 A perfectly positive relationship means that every time X increases Y also
increases; i.e., the smallest value of X is paired with the smallest value of
Y and so on
 The original scores are first converted to ranks, then the Spearman
correlation coefficient is used to measure the relationship for the ranks.
The degree of relationship for the ranks provides a measure of the
degree of consistency for the original scores.
Calculation of Spearman’s Correlation Coefficient
 Be sure you have ordinal data for X and Y scores
 The smallest value gets the rank 1 and the second smallest 2 and so on
 Rank X and Y separately
 Use the same formula on the ranked data as you used for Pearson’s r
11/23/2015 Shamsuddin Shahid, FKA, UTM
You created this PDF from an application that is not licensed to print to novaPDF printer (http://www.novapdf.com)
Rank Correlation
• Spearman Rank-Correlation Coefficient, rs
where: n = number of items being ranked
xi = rank of item i with respect to one variable
yi = rank of item i with respect to a second
variable
di = xi - yi
11/23/2015 Shamsuddin Shahid, FKA, UTM
You created this PDF from an application that is not licensed to print to novaPDF printer (http://www.novapdf.com)
Test for Significant Rank Correlation
• We may want to use sample results to make an inference
about the population rank correlation ps.
• To do so, we must test the hypotheses:
H0: ps = 0
Ha: ps  0
11/23/2015 Shamsuddin Shahid, FKA, UTM
You created this PDF from an application that is not licensed to print to novaPDF printer (http://www.novapdf.com)
Spearman Rank Correlation
Monthly Rainfall (mm): Sample-1: {79, 71, 108, 54, 67, 90}
Monthly Discharge (cusec): Sample 2: {122, 100, 121, 43, 54, 80}
If rs > Critical value
There is a significant
correlation
Null Hypothesis:
There exists no association
(or correlation) between
the samples
11/23/2015 Shamsuddin Shahid, FKA, UTM
You created this PDF from an application that is not licensed to print to novaPDF printer (http://www.novapdf.com)
11/23/2015 Shamsuddin Shahid, FKA, UTM
You created this PDF from an application that is not licensed to print to novaPDF printer (http://www.novapdf.com)
Merits Spearman’s Rank Correlation
• This method is simpler to understand and easier to apply
compared to karl pearson’s correlation method.
• This method is useful where we can give the ranks and
not the actual data. (qualitative term)
• This method is to use where the initial data in the form
of ranks.
11/23/2015 Shamsuddin Shahid, FKA, UTM
You created this PDF from an application that is not licensed to print to novaPDF printer (http://www.novapdf.com)
Limitation Spearman’s Correlation
• Cannot be used for finding out correlation in a grouped
frequency distribution.
• This method should be applied where N exceeds 30.
11/23/2015 Shamsuddin Shahid, FKA, UTM
You created this PDF from an application that is not licensed to print to novaPDF printer (http://www.novapdf.com)
Kendall's rank correlation provides a distribution free test of
independence and a measure of the strength of dependence
between two variables.
Spearman's rank correlation is satisfactory for testing a null
hypothesis of independence between two variables but Kendall's
rank correlation is much powerful.
Kendall-tau Rank Correlation
11/23/2015 Shamsuddin Shahid, FKA, UTM
You created this PDF from an application that is not licensed to print to novaPDF printer (http://www.novapdf.com)
Steps for Kendall-tau Rank Correlation
1. Arrange the data in increasing order of magnitude of the first
variable and label the objects with the resulting rank: 1 for the
smallest up to N for the largest.
2. Rearrange the data in order of increasing magnitude of the
second variable and record the rearranged order of the variable-
1 ranks
3. For each data, scan down variable-2, counting the number of
ranks that are larger.
4. Repeat the step(3), this time counting the number of ranks that
are smaller.
5. Subtract “smaller” from “larger” and sum the total (S).
11/23/2015 Shamsuddin Shahid, FKA, UTM
You created this PDF from an application that is not licensed to print to novaPDF printer (http://www.novapdf.com)
6. Kendall’s  is given by:
 = (2 x S) / [N x (N-1)]
7. Computer z-statistics as
z =  x [9 x N x (N-1)] / [2 x (2N + 5)]
8. Null hypothesis rejected if z is out of the following range:
-1.96 < z > 1.96
Steps for Kendall-tau Rank Correlation
11/23/2015 Shamsuddin Shahid, FKA, UTM
You created this PDF from an application that is not licensed to print to novaPDF printer (http://www.novapdf.com)
Kendall-tau Rank Correlation
Problem: Ten groundwater samples
are collected from different points
to see is there any relation between
groundwater depth and
contamination. Data are given in
the table. Is there any association
between depth and contamination.
Null Hypothesis: There exists no
association. Contamination is
independent of Groundwater
Depth.
11/23/2015 Shamsuddin Shahid, FKA, UTM
You created this PDF from an application that is not licensed to print to novaPDF printer (http://www.novapdf.com)
Kendall-tau Rank Correlation
Step-1: Rank the data
separately
Step-2: Re-arrange the
second ranks according
the rank of first variable
11/23/2015 Shamsuddin Shahid, FKA, UTM
You created this PDF from an application that is not licensed to print to novaPDF printer (http://www.novapdf.com)
Kendall-tau Rank Correlation
 = (2 x S) / [N x (N-1)]
z =  x [9 x N x (N-1)] / [2 x (2N + 5)]
11/23/2015 Shamsuddin Shahid, FKA, UTM
You created this PDF from an application that is not licensed to print to novaPDF printer (http://www.novapdf.com)
Kendall-tau Rank Correlation
Null Hypothesis:
There exist no relation between depth and contamination
Null hypothesis rejected (p=0.05) if z is out of the following range:
-1.96 < z > 1.96
z (calculated) = 3.67
z(calculated) > z (critical), therefore null hypothesis rejected.
Decision: There exist significant correlation between depth and
groundwater contamination
11/23/2015 Shamsuddin Shahid, FKA, UTM
You created this PDF from an application that is not licensed to print to novaPDF printer (http://www.novapdf.com)
Features of Correlation Coefficient
The correlation coefficient has the following properties:
 The correlation is not affected when the two variables are
interchanged.
 The correlation is not changed if the same number is added to all
the values of one of the variables.
 The correlation is not changed if all the values of one of the
variables is multiplied by the same positive number. It will change
sign if the number is negative.
11/23/2015 Shamsuddin Shahid, FKA, UTM
You created this PDF from an application that is not licensed to print to novaPDF printer (http://www.novapdf.com)
Factors affect correlation
• Restricted range
• Heterogenous samples
• Outliers
• Scale
11/23/2015 Shamsuddin Shahid, FKA, UTM
You created this PDF from an application that is not licensed to print to novaPDF printer (http://www.novapdf.com)
Range restriction
• Range restriction is when sample contains restricted (or
truncated) range of scores
– e.g., Groundwater Recharge and Rainfall > 5mm
• If range restriction, be cautious in generalising beyond
the range for which data is available
– e.g., Groundwater recharge less when rainfall is less, but below
a threshold level, there is no relation
11/23/2015 Shamsuddin Shahid, FKA, UTM
You created this PDF from an application that is not licensed to print to novaPDF printer (http://www.novapdf.com)
Range restriction
11/23/2015 Shamsuddin Shahid, FKA, UTM
You created this PDF from an application that is not licensed to print to novaPDF printer (http://www.novapdf.com)
Heterogenous samples
• Sub-samples may
artificially increase or
decrease overall r.
• Solution - calculate r
separately for sub-
samples and overall,
look for differences
11/23/2015 Shamsuddin Shahid, FKA, UTM
You created this PDF from an application that is not licensed to print to novaPDF printer (http://www.novapdf.com)
Heterogenous samples
11/23/2015 Shamsuddin Shahid, FKA, UTM
You created this PDF from an application that is not licensed to print to novaPDF printer (http://www.novapdf.com)
Effect of Outliers
• Outliers can disproportionately increase or decrease r.
• Options
– compute r with & without outliers
– get more data for outlying values
– recode outliers as having more conservative scores
– transformation
– recode variable into lower level of measurement
11/23/2015 Shamsuddin Shahid, FKA, UTM
You created this PDF from an application that is not licensed to print to novaPDF printer (http://www.novapdf.com)
Effect of Outliers
Outliers can disproportionately
increase or decrease r
11/23/2015 Shamsuddin Shahid, FKA, UTM
You created this PDF from an application that is not licensed to print to novaPDF printer (http://www.novapdf.com)
Closed Data
Sometimes, closed data or some discrete data shows high
correlation.
11/23/2015 Shamsuddin Shahid, FKA, UTM
You created this PDF from an application that is not licensed to print to novaPDF printer (http://www.novapdf.com)
Log Transformed Data
If data is transformed to log scale, then relation between log data
shows high correlation.
11/23/2015 Shamsuddin Shahid, FKA, UTM
You created this PDF from an application that is not licensed to print to novaPDF printer (http://www.novapdf.com)
Checklist
1. Graphs & Scatterplots
– Outliers?
– Linear?
– Does each variable have a reasonable range?
– Are there subsamples to consider?
2. Choose appropriate measure of Association
3. Conduct inferential test
4. Interpret/Discuss
11/23/2015 Shamsuddin Shahid, FKA, UTM
You created this PDF from an application that is not licensed to print to novaPDF printer (http://www.novapdf.com)
Association and Causation
ASSOCIATION
• If two attributes say A and B are found to co-exit more often
than an ordinary chance. Then they are correlated. We can
say that there is an association between attributes A and B.
• Correlation indicates the degree of association between two
variables.
CAUSATION
If one of these attributes say A is the suspected cause and the
other say B is the outcome then we have a reason to suspect
that A has caused B.
11/23/2015 Shamsuddin Shahid, FKA, UTM
You created this PDF from an application that is not licensed to print to novaPDF printer (http://www.novapdf.com)
Association and Causation
• Association does not mean causation.
• If association is consistence, then there may be
causation.
• If a relationship is causal, the findings should be
consistent with other data
• Causation always implies correlation but correlation
does not necessarily implies causation.
11/23/2015 Shamsuddin Shahid, FKA, UTM
You created this PDF from an application that is not licensed to print to novaPDF printer (http://www.novapdf.com)
Reporting
• State the research hypothesis
• Describe & interpret correlation
– direction of relationship
– size/strength of relationship
– Significance of relationship
• Acknowledge limitations e.g.,
– Heterogeneity (sub-samples)
– Range restriction
– Causality?
11/23/2015 Shamsuddin Shahid, FKA, UTM
You created this PDF from an application that is not licensed to print to novaPDF printer (http://www.novapdf.com)
Partial Correlation
River discharge depends on many factors, such as rainfall, soil
property, evapotranspiration, groundwater storage, etc. Each
independent factors are also correlated with each other.
11/23/2015 Shamsuddin Shahid, FKA, UTM
You created this PDF from an application that is not licensed to print to novaPDF printer (http://www.novapdf.com)
Partial Correlation
11/23/2015 Shamsuddin Shahid, FKA, UTM
You created this PDF from an application that is not licensed to print to novaPDF printer (http://www.novapdf.com)
Three (or more) Variables
• Three variables means three relationships
• Each can effect the other two
• Partial & semi-partial correlation—remove contributions of 3rd variable
Partial Correlation
11/23/2015 Shamsuddin Shahid, FKA, UTM
You created this PDF from an application that is not licensed to print to novaPDF printer (http://www.novapdf.com)
• Sometimes it is desirable to know the relationship between two
variables with the effects of a third variable held constant. We
can do it by using Partial correlation
• It helps us to find the ‘pure’ correlation between two variable with
holding the others constant.
• ‘Holding constant’ in this situation is known as partialling out, and
the technique for partialling out the effects of one or more
variables from two others, in order to find the relationship
between them is called partial correlation.
Partial Correlation
11/23/2015 Shamsuddin Shahid, FKA, UTM
You created this PDF from an application that is not licensed to print to novaPDF printer (http://www.novapdf.com)
A partial correlation is a correlation between two variables from
which the linear relations, or effects, of another variable(s) have
been removed.
Partial Correlation
11/23/2015 Shamsuddin Shahid, FKA, UTM
You created this PDF from an application that is not licensed to print to novaPDF printer (http://www.novapdf.com)
Partial Correlation
Correlation = 0.72
11/23/2015 Shamsuddin Shahid, FKA, UTM
You created this PDF from an application that is not licensed to print to novaPDF printer (http://www.novapdf.com)
Partial Correlation
Correlation = 0.7311/23/2015 Shamsuddin Shahid, FKA, UTM
You created this PDF from an application that is not licensed to print to novaPDF printer (http://www.novapdf.com)
Higher-Order Partial Correlation
The second-order partial correlation is the correlation between two
variables with the effects of two other variables being removed.
11/23/2015 Shamsuddin Shahid, FKA, UTM
You created this PDF from an application that is not licensed to print to novaPDF printer (http://www.novapdf.com)
With partial correlation, we find the correlation between X and Y
holding Z constant for both X and Y. Sometimes, however, we want
to hold Z constant for just X or just Y. In that case, we compute a
semipartial correlation.
Semipartial Correlation
Comparison between the partial and semipartial correlation:
Partial:
Semi-partial:
11/23/2015 Shamsuddin Shahid, FKA, UTM
You created this PDF from an application that is not licensed to print to novaPDF printer (http://www.novapdf.com)
Partial Correlation
The result doesn't make much
intuitive sense, but it does remind us
that the absolute value of the partial
is larger than the semipartial.
11/23/2015 Shamsuddin Shahid, FKA, UTM
You created this PDF from an application that is not licensed to print to novaPDF printer (http://www.novapdf.com)
• The partial and semipartial correlation formulas are the
same in the numerator and almost the same in the
denominator.
• The partial contains something extra, that is, something
missing from the semipartial correlation in the
denominator.
• This means that the partial correlation is going to be
larger in absolute value than the semipartial.
• This will be true except when the controlling or partialling
variable is uncorrelated with the variable to be controlled.
Semipartial Correlation
11/23/2015 Shamsuddin Shahid, FKA, UTM
You created this PDF from an application that is not licensed to print to novaPDF printer (http://www.novapdf.com)
Advantages of Correlation studies
• Show the amount (strength) of relationship present
• Can be used to make predictions about the variables
under study.
• Can be used in many places, including natural settings,
libraries, etc.
• Easier to collect co relational data
11/23/2015 Shamsuddin Shahid, FKA, UTM
You created this PDF from an application that is not licensed to print to novaPDF printer (http://www.novapdf.com)
Disadvantages of correlation studies
• Can’t assume that a cause-effect relationship exists
• Little or no control (experimental manipulation) of the
variables is possible
• Relationships may be accidental or due to a third,
unmeasured factor common to the 2 variables that are
measured
11/23/2015 Shamsuddin Shahid, FKA, UTM
You created this PDF from an application that is not licensed to print to novaPDF printer (http://www.novapdf.com)

More Related Content

What's hot

Shahid Lecture-2- MKAG1273
Shahid Lecture-2- MKAG1273Shahid Lecture-2- MKAG1273
Shahid Lecture-2- MKAG1273nchakori
 
Accelerating the production of safety summary and clinical safety reports - a...
Accelerating the production of safety summary and clinical safety reports - a...Accelerating the production of safety summary and clinical safety reports - a...
Accelerating the production of safety summary and clinical safety reports - a...Steffan Stringer
 
Download-manuals-surface water-waterlevel-37howtodohydrologicaldatavalidatio...
 Download-manuals-surface water-waterlevel-37howtodohydrologicaldatavalidatio... Download-manuals-surface water-waterlevel-37howtodohydrologicaldatavalidatio...
Download-manuals-surface water-waterlevel-37howtodohydrologicaldatavalidatio...hydrologyproject001
 
Quality management institute
Quality management instituteQuality management institute
Quality management instituteselinasimpson1301
 
Quality assurance management
Quality assurance managementQuality assurance management
Quality assurance managementselinasimpson0301
 

What's hot (6)

Shahid Lecture-2- MKAG1273
Shahid Lecture-2- MKAG1273Shahid Lecture-2- MKAG1273
Shahid Lecture-2- MKAG1273
 
Accelerating the production of safety summary and clinical safety reports - a...
Accelerating the production of safety summary and clinical safety reports - a...Accelerating the production of safety summary and clinical safety reports - a...
Accelerating the production of safety summary and clinical safety reports - a...
 
Download-manuals-surface water-waterlevel-37howtodohydrologicaldatavalidatio...
 Download-manuals-surface water-waterlevel-37howtodohydrologicaldatavalidatio... Download-manuals-surface water-waterlevel-37howtodohydrologicaldatavalidatio...
Download-manuals-surface water-waterlevel-37howtodohydrologicaldatavalidatio...
 
Air quality management
Air quality managementAir quality management
Air quality management
 
Quality management institute
Quality management instituteQuality management institute
Quality management institute
 
Quality assurance management
Quality assurance managementQuality assurance management
Quality assurance management
 

Recently uploaded

(PRIYA) Rajgurunagar Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(PRIYA) Rajgurunagar Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...(PRIYA) Rajgurunagar Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(PRIYA) Rajgurunagar Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...ranjana rawat
 
MANUFACTURING PROCESS-II UNIT-2 LATHE MACHINE
MANUFACTURING PROCESS-II UNIT-2 LATHE MACHINEMANUFACTURING PROCESS-II UNIT-2 LATHE MACHINE
MANUFACTURING PROCESS-II UNIT-2 LATHE MACHINESIVASHANKAR N
 
Introduction and different types of Ethernet.pptx
Introduction and different types of Ethernet.pptxIntroduction and different types of Ethernet.pptx
Introduction and different types of Ethernet.pptxupamatechverse
 
(MEERA) Dapodi Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Escorts
(MEERA) Dapodi Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Escorts(MEERA) Dapodi Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Escorts
(MEERA) Dapodi Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Escortsranjana rawat
 
High Profile Call Girls Nagpur Isha Call 7001035870 Meet With Nagpur Escorts
High Profile Call Girls Nagpur Isha Call 7001035870 Meet With Nagpur EscortsHigh Profile Call Girls Nagpur Isha Call 7001035870 Meet With Nagpur Escorts
High Profile Call Girls Nagpur Isha Call 7001035870 Meet With Nagpur Escortsranjana rawat
 
HARMONY IN THE NATURE AND EXISTENCE - Unit-IV
HARMONY IN THE NATURE AND EXISTENCE - Unit-IVHARMONY IN THE NATURE AND EXISTENCE - Unit-IV
HARMONY IN THE NATURE AND EXISTENCE - Unit-IVRajaP95
 
HARDNESS, FRACTURE TOUGHNESS AND STRENGTH OF CERAMICS
HARDNESS, FRACTURE TOUGHNESS AND STRENGTH OF CERAMICSHARDNESS, FRACTURE TOUGHNESS AND STRENGTH OF CERAMICS
HARDNESS, FRACTURE TOUGHNESS AND STRENGTH OF CERAMICSRajkumarAkumalla
 
ZXCTN 5804 / ZTE PTN / ZTE POTN / ZTE 5804 PTN / ZTE POTN 5804 ( 100/200 GE Z...
ZXCTN 5804 / ZTE PTN / ZTE POTN / ZTE 5804 PTN / ZTE POTN 5804 ( 100/200 GE Z...ZXCTN 5804 / ZTE PTN / ZTE POTN / ZTE 5804 PTN / ZTE POTN 5804 ( 100/200 GE Z...
ZXCTN 5804 / ZTE PTN / ZTE POTN / ZTE 5804 PTN / ZTE POTN 5804 ( 100/200 GE Z...ZTE
 
Biology for Computer Engineers Course Handout.pptx
Biology for Computer Engineers Course Handout.pptxBiology for Computer Engineers Course Handout.pptx
Biology for Computer Engineers Course Handout.pptxDeepakSakkari2
 
What are the advantages and disadvantages of membrane structures.pptx
What are the advantages and disadvantages of membrane structures.pptxWhat are the advantages and disadvantages of membrane structures.pptx
What are the advantages and disadvantages of membrane structures.pptxwendy cai
 
VIP Call Girls Service Hitech City Hyderabad Call +91-8250192130
VIP Call Girls Service Hitech City Hyderabad Call +91-8250192130VIP Call Girls Service Hitech City Hyderabad Call +91-8250192130
VIP Call Girls Service Hitech City Hyderabad Call +91-8250192130Suhani Kapoor
 
IMPLICATIONS OF THE ABOVE HOLISTIC UNDERSTANDING OF HARMONY ON PROFESSIONAL E...
IMPLICATIONS OF THE ABOVE HOLISTIC UNDERSTANDING OF HARMONY ON PROFESSIONAL E...IMPLICATIONS OF THE ABOVE HOLISTIC UNDERSTANDING OF HARMONY ON PROFESSIONAL E...
IMPLICATIONS OF THE ABOVE HOLISTIC UNDERSTANDING OF HARMONY ON PROFESSIONAL E...RajaP95
 
Analog to Digital and Digital to Analog Converter
Analog to Digital and Digital to Analog ConverterAnalog to Digital and Digital to Analog Converter
Analog to Digital and Digital to Analog ConverterAbhinavSharma374939
 
(ANVI) Koregaon Park Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(ANVI) Koregaon Park Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...(ANVI) Koregaon Park Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(ANVI) Koregaon Park Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...ranjana rawat
 
IVE Industry Focused Event - Defence Sector 2024
IVE Industry Focused Event - Defence Sector 2024IVE Industry Focused Event - Defence Sector 2024
IVE Industry Focused Event - Defence Sector 2024Mark Billinghurst
 
Decoding Kotlin - Your guide to solving the mysterious in Kotlin.pptx
Decoding Kotlin - Your guide to solving the mysterious in Kotlin.pptxDecoding Kotlin - Your guide to solving the mysterious in Kotlin.pptx
Decoding Kotlin - Your guide to solving the mysterious in Kotlin.pptxJoão Esperancinha
 
High Profile Call Girls Nashik Megha 7001305949 Independent Escort Service Na...
High Profile Call Girls Nashik Megha 7001305949 Independent Escort Service Na...High Profile Call Girls Nashik Megha 7001305949 Independent Escort Service Na...
High Profile Call Girls Nashik Megha 7001305949 Independent Escort Service Na...Call Girls in Nagpur High Profile
 
VIP Call Girls Service Kondapur Hyderabad Call +91-8250192130
VIP Call Girls Service Kondapur Hyderabad Call +91-8250192130VIP Call Girls Service Kondapur Hyderabad Call +91-8250192130
VIP Call Girls Service Kondapur Hyderabad Call +91-8250192130Suhani Kapoor
 
Internship report on mechanical engineering
Internship report on mechanical engineeringInternship report on mechanical engineering
Internship report on mechanical engineeringmalavadedarshan25
 

Recently uploaded (20)

(PRIYA) Rajgurunagar Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(PRIYA) Rajgurunagar Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...(PRIYA) Rajgurunagar Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(PRIYA) Rajgurunagar Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
 
MANUFACTURING PROCESS-II UNIT-2 LATHE MACHINE
MANUFACTURING PROCESS-II UNIT-2 LATHE MACHINEMANUFACTURING PROCESS-II UNIT-2 LATHE MACHINE
MANUFACTURING PROCESS-II UNIT-2 LATHE MACHINE
 
Introduction and different types of Ethernet.pptx
Introduction and different types of Ethernet.pptxIntroduction and different types of Ethernet.pptx
Introduction and different types of Ethernet.pptx
 
(MEERA) Dapodi Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Escorts
(MEERA) Dapodi Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Escorts(MEERA) Dapodi Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Escorts
(MEERA) Dapodi Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Escorts
 
High Profile Call Girls Nagpur Isha Call 7001035870 Meet With Nagpur Escorts
High Profile Call Girls Nagpur Isha Call 7001035870 Meet With Nagpur EscortsHigh Profile Call Girls Nagpur Isha Call 7001035870 Meet With Nagpur Escorts
High Profile Call Girls Nagpur Isha Call 7001035870 Meet With Nagpur Escorts
 
HARMONY IN THE NATURE AND EXISTENCE - Unit-IV
HARMONY IN THE NATURE AND EXISTENCE - Unit-IVHARMONY IN THE NATURE AND EXISTENCE - Unit-IV
HARMONY IN THE NATURE AND EXISTENCE - Unit-IV
 
HARDNESS, FRACTURE TOUGHNESS AND STRENGTH OF CERAMICS
HARDNESS, FRACTURE TOUGHNESS AND STRENGTH OF CERAMICSHARDNESS, FRACTURE TOUGHNESS AND STRENGTH OF CERAMICS
HARDNESS, FRACTURE TOUGHNESS AND STRENGTH OF CERAMICS
 
ZXCTN 5804 / ZTE PTN / ZTE POTN / ZTE 5804 PTN / ZTE POTN 5804 ( 100/200 GE Z...
ZXCTN 5804 / ZTE PTN / ZTE POTN / ZTE 5804 PTN / ZTE POTN 5804 ( 100/200 GE Z...ZXCTN 5804 / ZTE PTN / ZTE POTN / ZTE 5804 PTN / ZTE POTN 5804 ( 100/200 GE Z...
ZXCTN 5804 / ZTE PTN / ZTE POTN / ZTE 5804 PTN / ZTE POTN 5804 ( 100/200 GE Z...
 
Biology for Computer Engineers Course Handout.pptx
Biology for Computer Engineers Course Handout.pptxBiology for Computer Engineers Course Handout.pptx
Biology for Computer Engineers Course Handout.pptx
 
What are the advantages and disadvantages of membrane structures.pptx
What are the advantages and disadvantages of membrane structures.pptxWhat are the advantages and disadvantages of membrane structures.pptx
What are the advantages and disadvantages of membrane structures.pptx
 
VIP Call Girls Service Hitech City Hyderabad Call +91-8250192130
VIP Call Girls Service Hitech City Hyderabad Call +91-8250192130VIP Call Girls Service Hitech City Hyderabad Call +91-8250192130
VIP Call Girls Service Hitech City Hyderabad Call +91-8250192130
 
IMPLICATIONS OF THE ABOVE HOLISTIC UNDERSTANDING OF HARMONY ON PROFESSIONAL E...
IMPLICATIONS OF THE ABOVE HOLISTIC UNDERSTANDING OF HARMONY ON PROFESSIONAL E...IMPLICATIONS OF THE ABOVE HOLISTIC UNDERSTANDING OF HARMONY ON PROFESSIONAL E...
IMPLICATIONS OF THE ABOVE HOLISTIC UNDERSTANDING OF HARMONY ON PROFESSIONAL E...
 
Analog to Digital and Digital to Analog Converter
Analog to Digital and Digital to Analog ConverterAnalog to Digital and Digital to Analog Converter
Analog to Digital and Digital to Analog Converter
 
(ANVI) Koregaon Park Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(ANVI) Koregaon Park Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...(ANVI) Koregaon Park Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(ANVI) Koregaon Park Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
 
IVE Industry Focused Event - Defence Sector 2024
IVE Industry Focused Event - Defence Sector 2024IVE Industry Focused Event - Defence Sector 2024
IVE Industry Focused Event - Defence Sector 2024
 
Decoding Kotlin - Your guide to solving the mysterious in Kotlin.pptx
Decoding Kotlin - Your guide to solving the mysterious in Kotlin.pptxDecoding Kotlin - Your guide to solving the mysterious in Kotlin.pptx
Decoding Kotlin - Your guide to solving the mysterious in Kotlin.pptx
 
Exploring_Network_Security_with_JA3_by_Rakesh Seal.pptx
Exploring_Network_Security_with_JA3_by_Rakesh Seal.pptxExploring_Network_Security_with_JA3_by_Rakesh Seal.pptx
Exploring_Network_Security_with_JA3_by_Rakesh Seal.pptx
 
High Profile Call Girls Nashik Megha 7001305949 Independent Escort Service Na...
High Profile Call Girls Nashik Megha 7001305949 Independent Escort Service Na...High Profile Call Girls Nashik Megha 7001305949 Independent Escort Service Na...
High Profile Call Girls Nashik Megha 7001305949 Independent Escort Service Na...
 
VIP Call Girls Service Kondapur Hyderabad Call +91-8250192130
VIP Call Girls Service Kondapur Hyderabad Call +91-8250192130VIP Call Girls Service Kondapur Hyderabad Call +91-8250192130
VIP Call Girls Service Kondapur Hyderabad Call +91-8250192130
 
Internship report on mechanical engineering
Internship report on mechanical engineeringInternship report on mechanical engineering
Internship report on mechanical engineering
 

Shahid Lecture-5- MKAG1273

  • 1. MAL1303: Statistical Hydrology Correlation Dr. Shamsuddin Shahid Department of Hydraulics and Hydrology Faculty of Civil Engineering, Universiti Teknologi Malaysia Room No. M46-332; E-mail: sshahid@utm.my Mobile: 0182051586 11/23/2015 Shamsuddin Shahid, FKA, UTM You created this PDF from an application that is not licensed to print to novaPDF printer (http://www.novapdf.com)
  • 2. Research Questions: Are two variables related? Example questions in hydrology: – “Is there any relation between rainfall and river discharge?” – “Is there any relation between low river flow and river water quality?” – “Is there any relation between elevation and rainfall?” – “Is there any relation between rainfall intensity and landslides? Test the relationship: Correlation 11/23/2015 Shamsuddin Shahid, FKA, UTM You created this PDF from an application that is not licensed to print to novaPDF printer (http://www.novapdf.com)
  • 3. Correlation Definition: Correlation is a statistical method that is used to examine the extent to which two variables have a simple linear relationship. Questions:  What does it mean to say that two variables are associated with one another?  How can we mathematically formalize the concept of association? Answer: Correlation 11/23/2015 Shamsuddin Shahid, FKA, UTM You created this PDF from an application that is not licensed to print to novaPDF printer (http://www.novapdf.com)
  • 4. Correlation gives relationship between two variables: – direction – Strength – Significance Sign indicates direction Size indicates strength Comparison with critical values gives significance Correlation 11/23/2015 Shamsuddin Shahid, FKA, UTM You created this PDF from an application that is not licensed to print to novaPDF printer (http://www.novapdf.com)
  • 5. Scatter Plots • Plot each pair of observations (X, Y) • x = predictor variable (independent) • y = criterion variable (dependent) • Check for: – outliers – linearity 11/23/2015 Shamsuddin Shahid, FKA, UTM You created this PDF from an application that is not licensed to print to novaPDF printer (http://www.novapdf.com)
  • 6. How do you study the relationship between two variables? Groundwater temperature data are collected at different depth from the earth surface. A list of these data is difficult to understand. The relationship between the two variables can be visualized using a scatter diagram, where each pair depth-temperature is represented as a point in a plane. 11/23/2015 Shamsuddin Shahid, FKA, UTM You created this PDF from an application that is not licensed to print to novaPDF printer (http://www.novapdf.com)
  • 7. Types of Correlation Correlation Positive Correlation Negative Correlation Positive Correlation: The correlation is said to be positive correlation if the values of two variables changing with same direction. Negative Correlation: The correlation is said to be negative correlation when the values of variables change with opposite direction. Type I 11/23/2015 Shamsuddin Shahid, FKA, UTM You created this PDF from an application that is not licensed to print to novaPDF printer (http://www.novapdf.com)
  • 8. Positive & Negative Association At each depth two data are collected: Temperature and Nitrogen Concentration. We obtained two scatter plot: (i) Depth vs. Groundwater Temperature; (ii) Depth vs. Nitrogen Concentration in Groundwater. In the first graph, it is observed that temperature is increasing with depth, as a general tendency. This corresponds to a positive association. In the second graph, Nitrogen concentration decreasing with depth. This corresponds to a negative association. 11/23/2015 Shamsuddin Shahid, FKA, UTM You created this PDF from an application that is not licensed to print to novaPDF printer (http://www.novapdf.com)
  • 9. Types of Correlation Correlation Simple Multiple Partial Total Type II 11/23/2015 Shamsuddin Shahid, FKA, UTM You created this PDF from an application that is not licensed to print to novaPDF printer (http://www.novapdf.com)
  • 10. Types of Correlation Type II • Simple correlation: Under simple correlation problem there are only two variables are studied. • Multiple Correlation: Under Multiple Correlation three or more than three variables are studied. • Partial correlation: analysis recognizes more than two variables but considers only two variables keeping the other constant. • Total correlation: is based on all the relevant variables, which is normally not feasible. 11/23/2015 Shamsuddin Shahid, FKA, UTM You created this PDF from an application that is not licensed to print to novaPDF printer (http://www.novapdf.com)
  • 11. Types of Correlation Correlation LINEAR NON LINEAR Type III 11/23/2015 Shamsuddin Shahid, FKA, UTM You created this PDF from an application that is not licensed to print to novaPDF printer (http://www.novapdf.com)
  • 12. Types of Correlation Type III • Linear correlation: Correlation is said to be linear when the amount of change in one variable tends to bear a constant ratio to the amount of change in the other. The graph of the variables having a linear relationship will form a straight line. • Non Linear correlation: The correlation would be non linear if the amount of change in one variable does not bear a constant ratio to the amount of change in the other variable. 11/23/2015 Shamsuddin Shahid, FKA, UTM You created this PDF from an application that is not licensed to print to novaPDF printer (http://www.novapdf.com)
  • 13. Correlation Coefficient  The correlation coefficient gives a measure of the linear association of two variables. It defines the degree of relationship.  The correlation coefficient is usually denoted by r and takes values between -1 and 1. r is positive; between 0 and 1 r is negative; between 0 and -1 11/23/2015 Shamsuddin Shahid, FKA, UTM You created this PDF from an application that is not licensed to print to novaPDF printer (http://www.novapdf.com)
  • 14. Correlation Coefficient  Nitrogen concentration Data are collected at two different locations and obtained two plots given below. Both show negative correlation between depth and Nitrogen concentration. Correlation coefficient, r will be more negative in case of first plot compared to second plot.  If the scatter plot of the two variables is very close to the straight line we have a correlation that is close to one. A near zero correlation corresponds to a diagram where the data are widely scattered around the line. 11/23/2015 Shamsuddin Shahid, FKA, UTM You created this PDF from an application that is not licensed to print to novaPDF printer (http://www.novapdf.com)
  • 15. 11/23/2015 Shamsuddin Shahid, FKA, UTM You created this PDF from an application that is not licensed to print to novaPDF printer (http://www.novapdf.com)
  • 16. Correlation Coefficient - Summary  A positive coefficient means that the data are clustered around lines with a positive slope. That is, as one variable increases, the other one also increases.  A negative coefficient means that the data are clustered around lines with a negative slope. That is, as one variable increases, the other one decreases.  The closer r is to 1 the stronger the positive linear association between the variables.  The closer r is to -1 the stronger the negative linear association between the variables.  When r is equal to or near to 1 or -1 there is a linear association between the variables.  When r is equal to or near to 0, there no association between the variables. 11/23/2015 Shamsuddin Shahid, FKA, UTM You created this PDF from an application that is not licensed to print to novaPDF printer (http://www.novapdf.com)
  • 17. Pearson Correlation  Pearson correlation is used to describe relationship between two variables that are both interval and ration variables.  Pearson correlation compares how consistently each Y value is paired with each X value in a linear fashion 11/23/2015 Shamsuddin Shahid, FKA, UTM You created this PDF from an application that is not licensed to print to novaPDF printer (http://www.novapdf.com)
  • 18. Covariance • covariance is a measure of how much two variables change together. • Variance shared by 2 variables • Covariance reflects the direction of the relationship:  Positive covariance indicates + relationship  Negative covariance indicates - relationship 11/23/2015 Shamsuddin Shahid, FKA, UTM You created this PDF from an application that is not licensed to print to novaPDF printer (http://www.novapdf.com)
  • 19. Computational Formula Sum of Squares (SS) measures the amount of variation or variability of a single variable. Sum of Products (SP) provides a parallel procedure for measuring the amount of covariation or covariability between two variables. 11/23/2015 Shamsuddin Shahid, FKA, UTM You created this PDF from an application that is not licensed to print to novaPDF printer (http://www.novapdf.com)
  • 20. Calculation of Pearson’s Correlation Coefficient  Pearson’s correlation coefficient is a ratio comparing the covariability of X and Y with variability of X and Y separately.  SP measures the covariability of X and Y  The variability of X and Y is measured by calculating the SS for X and Y scores separately 11/23/2015 Shamsuddin Shahid, FKA, UTM You created this PDF from an application that is not licensed to print to novaPDF printer (http://www.novapdf.com)
  • 21. Calculation of Pearson’s Correlation Coefficient Let, X represent Depth in feet and Y represent Nitrate Concentration in mg/l. The association between Groundwater Depth and Nitrate Concentration can be found as below: 11/23/2015 Shamsuddin Shahid, FKA, UTM You created this PDF from an application that is not licensed to print to novaPDF printer (http://www.novapdf.com)
  • 22. Hypothesis Testing  H0 : there is no correlation between depth and nitrate concentration or the population correlation is 0.  H1: there is a real non-zero correlation in the population.  Population correlation is traditionally represented by , therefore, with symbol we can write, H0 :  = 0 H1:  ≠ 0  For the pearson’s correlation, Degree of Freedom df = n-2. Where n is the sample size. We lose 2 degree of freedoms because we need to estimate two means, one for each variance estimate.  If the calculated r is equal to or exceeds the critical value (given in Table) then obtained r is significant. 11/23/2015 Shamsuddin Shahid, FKA, UTM You created this PDF from an application that is not licensed to print to novaPDF printer (http://www.novapdf.com)
  • 23. Hypothesis Testing In the present case, r = 0.875 df = n-2 = 5-2 = 3 Critical value for α = 0.05, df = 3 is 0.878. Therefore, we accept H0 :  = 0 There is no correlation between the populations 11/23/2015 Shamsuddin Shahid, FKA, UTM You created this PDF from an application that is not licensed to print to novaPDF printer (http://www.novapdf.com)
  • 24. Significance of Correlation Df Critical Value (N-2) p = .05 5 .67 10 .50 15 .41 20 .36 25 .32 30 .30 50 .23 200 .11 500 .07 1000 .05 11/23/2015 Shamsuddin Shahid, FKA, UTM You created this PDF from an application that is not licensed to print to novaPDF printer (http://www.novapdf.com)
  • 25. Correlation: r & r2  As a matter of routine it is the squared correlations that should be interpreted. This is because the correlation coefficient is misleading in suggesting the existence of more covariation than exists, and this problem gets worse as the correlation approaches zero.  Note that as the correlation r decrease by tenths, the r2 decreases by much more. A correlation of .50 only shows that 25 percent variance is in common; a correlation of .20 shows 4 percent in common; and a correlation of .10 shows 1 percent in common (or 99 percent not in common).  Thus, squaring should be a healthy corrective to the tendency to consider low correlations, such as .20 and .30, as indicating a meaningful or practical covariation. 11/23/2015 Shamsuddin Shahid, FKA, UTM You created this PDF from an application that is not licensed to print to novaPDF printer (http://www.novapdf.com)
  • 26. Assumptions • Scale of measurement is interval • Linear relationships • Homoscedasticity • Similar normal underlying distributions • No outliers 11/23/2015 Shamsuddin Shahid, FKA, UTM You created this PDF from an application that is not licensed to print to novaPDF printer (http://www.novapdf.com)
  • 27. Homoscedasticity 11/23/2015 Shamsuddin Shahid, FKA, UTM You created this PDF from an application that is not licensed to print to novaPDF printer (http://www.novapdf.com)
  • 28. Advantages and Disadvanateges of Pearson’s Coefficient Advantages • It summarizes in one value, the degree of correlation & direction of correlation also. Limitations • Always assume linear relationship • Interpreting the value of r is difficult. • Value of Correlation Coefficient is affected by the extreme values. 11/23/2015 Shamsuddin Shahid, FKA, UTM You created this PDF from an application that is not licensed to print to novaPDF printer (http://www.novapdf.com)
  • 29. Parametric and Non-parametric Correlation Parametric correlation: when distribution of data is normal. Example: Pearson Correlation Non-parametric correlation: when distribution of data is not normal Example: Spearman’s Rank Correlation, Kendall- Correlation 11/23/2015 Shamsuddin Shahid, FKA, UTM You created this PDF from an application that is not licensed to print to novaPDF printer (http://www.novapdf.com)
  • 30. The Spearman Correlation  Spearman’s correlation is designed to measure the relationship between variables measured on an ordinal scale of measurement  A perfectly positive relationship means that every time X increases Y also increases; i.e., the smallest value of X is paired with the smallest value of Y and so on  The original scores are first converted to ranks, then the Spearman correlation coefficient is used to measure the relationship for the ranks. The degree of relationship for the ranks provides a measure of the degree of consistency for the original scores. Calculation of Spearman’s Correlation Coefficient  Be sure you have ordinal data for X and Y scores  The smallest value gets the rank 1 and the second smallest 2 and so on  Rank X and Y separately  Use the same formula on the ranked data as you used for Pearson’s r 11/23/2015 Shamsuddin Shahid, FKA, UTM You created this PDF from an application that is not licensed to print to novaPDF printer (http://www.novapdf.com)
  • 31. Rank Correlation • Spearman Rank-Correlation Coefficient, rs where: n = number of items being ranked xi = rank of item i with respect to one variable yi = rank of item i with respect to a second variable di = xi - yi 11/23/2015 Shamsuddin Shahid, FKA, UTM You created this PDF from an application that is not licensed to print to novaPDF printer (http://www.novapdf.com)
  • 32. Test for Significant Rank Correlation • We may want to use sample results to make an inference about the population rank correlation ps. • To do so, we must test the hypotheses: H0: ps = 0 Ha: ps  0 11/23/2015 Shamsuddin Shahid, FKA, UTM You created this PDF from an application that is not licensed to print to novaPDF printer (http://www.novapdf.com)
  • 33. Spearman Rank Correlation Monthly Rainfall (mm): Sample-1: {79, 71, 108, 54, 67, 90} Monthly Discharge (cusec): Sample 2: {122, 100, 121, 43, 54, 80} If rs > Critical value There is a significant correlation Null Hypothesis: There exists no association (or correlation) between the samples 11/23/2015 Shamsuddin Shahid, FKA, UTM You created this PDF from an application that is not licensed to print to novaPDF printer (http://www.novapdf.com)
  • 34. 11/23/2015 Shamsuddin Shahid, FKA, UTM You created this PDF from an application that is not licensed to print to novaPDF printer (http://www.novapdf.com)
  • 35. Merits Spearman’s Rank Correlation • This method is simpler to understand and easier to apply compared to karl pearson’s correlation method. • This method is useful where we can give the ranks and not the actual data. (qualitative term) • This method is to use where the initial data in the form of ranks. 11/23/2015 Shamsuddin Shahid, FKA, UTM You created this PDF from an application that is not licensed to print to novaPDF printer (http://www.novapdf.com)
  • 36. Limitation Spearman’s Correlation • Cannot be used for finding out correlation in a grouped frequency distribution. • This method should be applied where N exceeds 30. 11/23/2015 Shamsuddin Shahid, FKA, UTM You created this PDF from an application that is not licensed to print to novaPDF printer (http://www.novapdf.com)
  • 37. Kendall's rank correlation provides a distribution free test of independence and a measure of the strength of dependence between two variables. Spearman's rank correlation is satisfactory for testing a null hypothesis of independence between two variables but Kendall's rank correlation is much powerful. Kendall-tau Rank Correlation 11/23/2015 Shamsuddin Shahid, FKA, UTM You created this PDF from an application that is not licensed to print to novaPDF printer (http://www.novapdf.com)
  • 38. Steps for Kendall-tau Rank Correlation 1. Arrange the data in increasing order of magnitude of the first variable and label the objects with the resulting rank: 1 for the smallest up to N for the largest. 2. Rearrange the data in order of increasing magnitude of the second variable and record the rearranged order of the variable- 1 ranks 3. For each data, scan down variable-2, counting the number of ranks that are larger. 4. Repeat the step(3), this time counting the number of ranks that are smaller. 5. Subtract “smaller” from “larger” and sum the total (S). 11/23/2015 Shamsuddin Shahid, FKA, UTM You created this PDF from an application that is not licensed to print to novaPDF printer (http://www.novapdf.com)
  • 39. 6. Kendall’s  is given by:  = (2 x S) / [N x (N-1)] 7. Computer z-statistics as z =  x [9 x N x (N-1)] / [2 x (2N + 5)] 8. Null hypothesis rejected if z is out of the following range: -1.96 < z > 1.96 Steps for Kendall-tau Rank Correlation 11/23/2015 Shamsuddin Shahid, FKA, UTM You created this PDF from an application that is not licensed to print to novaPDF printer (http://www.novapdf.com)
  • 40. Kendall-tau Rank Correlation Problem: Ten groundwater samples are collected from different points to see is there any relation between groundwater depth and contamination. Data are given in the table. Is there any association between depth and contamination. Null Hypothesis: There exists no association. Contamination is independent of Groundwater Depth. 11/23/2015 Shamsuddin Shahid, FKA, UTM You created this PDF from an application that is not licensed to print to novaPDF printer (http://www.novapdf.com)
  • 41. Kendall-tau Rank Correlation Step-1: Rank the data separately Step-2: Re-arrange the second ranks according the rank of first variable 11/23/2015 Shamsuddin Shahid, FKA, UTM You created this PDF from an application that is not licensed to print to novaPDF printer (http://www.novapdf.com)
  • 42. Kendall-tau Rank Correlation  = (2 x S) / [N x (N-1)] z =  x [9 x N x (N-1)] / [2 x (2N + 5)] 11/23/2015 Shamsuddin Shahid, FKA, UTM You created this PDF from an application that is not licensed to print to novaPDF printer (http://www.novapdf.com)
  • 43. Kendall-tau Rank Correlation Null Hypothesis: There exist no relation between depth and contamination Null hypothesis rejected (p=0.05) if z is out of the following range: -1.96 < z > 1.96 z (calculated) = 3.67 z(calculated) > z (critical), therefore null hypothesis rejected. Decision: There exist significant correlation between depth and groundwater contamination 11/23/2015 Shamsuddin Shahid, FKA, UTM You created this PDF from an application that is not licensed to print to novaPDF printer (http://www.novapdf.com)
  • 44. Features of Correlation Coefficient The correlation coefficient has the following properties:  The correlation is not affected when the two variables are interchanged.  The correlation is not changed if the same number is added to all the values of one of the variables.  The correlation is not changed if all the values of one of the variables is multiplied by the same positive number. It will change sign if the number is negative. 11/23/2015 Shamsuddin Shahid, FKA, UTM You created this PDF from an application that is not licensed to print to novaPDF printer (http://www.novapdf.com)
  • 45. Factors affect correlation • Restricted range • Heterogenous samples • Outliers • Scale 11/23/2015 Shamsuddin Shahid, FKA, UTM You created this PDF from an application that is not licensed to print to novaPDF printer (http://www.novapdf.com)
  • 46. Range restriction • Range restriction is when sample contains restricted (or truncated) range of scores – e.g., Groundwater Recharge and Rainfall > 5mm • If range restriction, be cautious in generalising beyond the range for which data is available – e.g., Groundwater recharge less when rainfall is less, but below a threshold level, there is no relation 11/23/2015 Shamsuddin Shahid, FKA, UTM You created this PDF from an application that is not licensed to print to novaPDF printer (http://www.novapdf.com)
  • 47. Range restriction 11/23/2015 Shamsuddin Shahid, FKA, UTM You created this PDF from an application that is not licensed to print to novaPDF printer (http://www.novapdf.com)
  • 48. Heterogenous samples • Sub-samples may artificially increase or decrease overall r. • Solution - calculate r separately for sub- samples and overall, look for differences 11/23/2015 Shamsuddin Shahid, FKA, UTM You created this PDF from an application that is not licensed to print to novaPDF printer (http://www.novapdf.com)
  • 49. Heterogenous samples 11/23/2015 Shamsuddin Shahid, FKA, UTM You created this PDF from an application that is not licensed to print to novaPDF printer (http://www.novapdf.com)
  • 50. Effect of Outliers • Outliers can disproportionately increase or decrease r. • Options – compute r with & without outliers – get more data for outlying values – recode outliers as having more conservative scores – transformation – recode variable into lower level of measurement 11/23/2015 Shamsuddin Shahid, FKA, UTM You created this PDF from an application that is not licensed to print to novaPDF printer (http://www.novapdf.com)
  • 51. Effect of Outliers Outliers can disproportionately increase or decrease r 11/23/2015 Shamsuddin Shahid, FKA, UTM You created this PDF from an application that is not licensed to print to novaPDF printer (http://www.novapdf.com)
  • 52. Closed Data Sometimes, closed data or some discrete data shows high correlation. 11/23/2015 Shamsuddin Shahid, FKA, UTM You created this PDF from an application that is not licensed to print to novaPDF printer (http://www.novapdf.com)
  • 53. Log Transformed Data If data is transformed to log scale, then relation between log data shows high correlation. 11/23/2015 Shamsuddin Shahid, FKA, UTM You created this PDF from an application that is not licensed to print to novaPDF printer (http://www.novapdf.com)
  • 54. Checklist 1. Graphs & Scatterplots – Outliers? – Linear? – Does each variable have a reasonable range? – Are there subsamples to consider? 2. Choose appropriate measure of Association 3. Conduct inferential test 4. Interpret/Discuss 11/23/2015 Shamsuddin Shahid, FKA, UTM You created this PDF from an application that is not licensed to print to novaPDF printer (http://www.novapdf.com)
  • 55. Association and Causation ASSOCIATION • If two attributes say A and B are found to co-exit more often than an ordinary chance. Then they are correlated. We can say that there is an association between attributes A and B. • Correlation indicates the degree of association between two variables. CAUSATION If one of these attributes say A is the suspected cause and the other say B is the outcome then we have a reason to suspect that A has caused B. 11/23/2015 Shamsuddin Shahid, FKA, UTM You created this PDF from an application that is not licensed to print to novaPDF printer (http://www.novapdf.com)
  • 56. Association and Causation • Association does not mean causation. • If association is consistence, then there may be causation. • If a relationship is causal, the findings should be consistent with other data • Causation always implies correlation but correlation does not necessarily implies causation. 11/23/2015 Shamsuddin Shahid, FKA, UTM You created this PDF from an application that is not licensed to print to novaPDF printer (http://www.novapdf.com)
  • 57. Reporting • State the research hypothesis • Describe & interpret correlation – direction of relationship – size/strength of relationship – Significance of relationship • Acknowledge limitations e.g., – Heterogeneity (sub-samples) – Range restriction – Causality? 11/23/2015 Shamsuddin Shahid, FKA, UTM You created this PDF from an application that is not licensed to print to novaPDF printer (http://www.novapdf.com)
  • 58. Partial Correlation River discharge depends on many factors, such as rainfall, soil property, evapotranspiration, groundwater storage, etc. Each independent factors are also correlated with each other. 11/23/2015 Shamsuddin Shahid, FKA, UTM You created this PDF from an application that is not licensed to print to novaPDF printer (http://www.novapdf.com)
  • 59. Partial Correlation 11/23/2015 Shamsuddin Shahid, FKA, UTM You created this PDF from an application that is not licensed to print to novaPDF printer (http://www.novapdf.com)
  • 60. Three (or more) Variables • Three variables means three relationships • Each can effect the other two • Partial & semi-partial correlation—remove contributions of 3rd variable Partial Correlation 11/23/2015 Shamsuddin Shahid, FKA, UTM You created this PDF from an application that is not licensed to print to novaPDF printer (http://www.novapdf.com)
  • 61. • Sometimes it is desirable to know the relationship between two variables with the effects of a third variable held constant. We can do it by using Partial correlation • It helps us to find the ‘pure’ correlation between two variable with holding the others constant. • ‘Holding constant’ in this situation is known as partialling out, and the technique for partialling out the effects of one or more variables from two others, in order to find the relationship between them is called partial correlation. Partial Correlation 11/23/2015 Shamsuddin Shahid, FKA, UTM You created this PDF from an application that is not licensed to print to novaPDF printer (http://www.novapdf.com)
  • 62. A partial correlation is a correlation between two variables from which the linear relations, or effects, of another variable(s) have been removed. Partial Correlation 11/23/2015 Shamsuddin Shahid, FKA, UTM You created this PDF from an application that is not licensed to print to novaPDF printer (http://www.novapdf.com)
  • 63. Partial Correlation Correlation = 0.72 11/23/2015 Shamsuddin Shahid, FKA, UTM You created this PDF from an application that is not licensed to print to novaPDF printer (http://www.novapdf.com)
  • 64. Partial Correlation Correlation = 0.7311/23/2015 Shamsuddin Shahid, FKA, UTM You created this PDF from an application that is not licensed to print to novaPDF printer (http://www.novapdf.com)
  • 65. Higher-Order Partial Correlation The second-order partial correlation is the correlation between two variables with the effects of two other variables being removed. 11/23/2015 Shamsuddin Shahid, FKA, UTM You created this PDF from an application that is not licensed to print to novaPDF printer (http://www.novapdf.com)
  • 66. With partial correlation, we find the correlation between X and Y holding Z constant for both X and Y. Sometimes, however, we want to hold Z constant for just X or just Y. In that case, we compute a semipartial correlation. Semipartial Correlation Comparison between the partial and semipartial correlation: Partial: Semi-partial: 11/23/2015 Shamsuddin Shahid, FKA, UTM You created this PDF from an application that is not licensed to print to novaPDF printer (http://www.novapdf.com)
  • 67. Partial Correlation The result doesn't make much intuitive sense, but it does remind us that the absolute value of the partial is larger than the semipartial. 11/23/2015 Shamsuddin Shahid, FKA, UTM You created this PDF from an application that is not licensed to print to novaPDF printer (http://www.novapdf.com)
  • 68. • The partial and semipartial correlation formulas are the same in the numerator and almost the same in the denominator. • The partial contains something extra, that is, something missing from the semipartial correlation in the denominator. • This means that the partial correlation is going to be larger in absolute value than the semipartial. • This will be true except when the controlling or partialling variable is uncorrelated with the variable to be controlled. Semipartial Correlation 11/23/2015 Shamsuddin Shahid, FKA, UTM You created this PDF from an application that is not licensed to print to novaPDF printer (http://www.novapdf.com)
  • 69. Advantages of Correlation studies • Show the amount (strength) of relationship present • Can be used to make predictions about the variables under study. • Can be used in many places, including natural settings, libraries, etc. • Easier to collect co relational data 11/23/2015 Shamsuddin Shahid, FKA, UTM You created this PDF from an application that is not licensed to print to novaPDF printer (http://www.novapdf.com)
  • 70. Disadvantages of correlation studies • Can’t assume that a cause-effect relationship exists • Little or no control (experimental manipulation) of the variables is possible • Relationships may be accidental or due to a third, unmeasured factor common to the 2 variables that are measured 11/23/2015 Shamsuddin Shahid, FKA, UTM You created this PDF from an application that is not licensed to print to novaPDF printer (http://www.novapdf.com)