Up to now, the statistical methods you have learnt concern with single variable only
To estimate the average height among high school students
To compare the average height of high school students between city and country side
However, the relationship between two variables is often concerned in practice:
Example: For high school students,
Height and Age – linear relation?
Height and Weight – linear relation?
In this chapter, we are going to study:
linear relationship between two variables
Two types of questions:
Whether there is a linear relationship?
-- Linear correlation
How to predict one variable by another variable?
-- Linear regression
Example 7.1 To explore the correlation between
systolic pressure and diastolic pressure (mmHg),
665 girls aged from 6 to 10 years were measured.
Two random variables X and Y
Sample: 665 girls
The individuals in the sample should be
independent each other.
Example 7.2 To explore the correlation between
the heights of father and son, 20 graduate male
students were randomly selected from a name
list of graduates in a high school. The heights
(cm) of fathers and sons were measured
Table 7.1 Heights (cm) of 20 pairs of father and son
12.1 Linear correlation
Scatter Diagram : Fig. 7.1 Scatter diagram of systolic and diastolic blood pressures (mmHg) of 665 girls of 6 to 10 years old
7.2 Correlation Coefficient 7.2.1 Population correlation coefficient
Pearson’s product-moment linear correlation coefficient:
The mean of “product of the two standardized variables”
---- simple correlation coefficient
-- covariance between X and Y
Pearson’s product moment sample correlation coefficient r 7.2.2 Sample correlation coefficient
A measurement of linear relationship: 1) Whether there is a correlation If the correlation coefficient is 0 or not big enough -- no correlation 2) If correlation coefficient is big enough The direction of correlation? -- positive or negative The strength of correlation? high or not? -- Is the absolute value big enough? Complete correlation : +1 or -1,
Example 7.3 Calculate the correlation coefficient between the heights of father and son.
r is sample correlation coefficient, change from sample to sample
There is a population correlation coefficient, denoted by ρ
Question : Whether ρ =0 or not?
X and Y follow a bi-variable normal distribution
7.3 Inference on Correlation Coefficient 7.3.1 Hypothesis test
H 0 : ρ =0, H 1 : ρ ≠0 α =0.05
(1) Checking a special table (Table 8 in appendix 2 )
H 0 is rejected
---- positive correlation between the heights of father and son.
Since , very small, can we say
the correlation is very strong ?
Does a small P value mean that the correlation
is strong ?
Question : If r =0.90, can you claim the two variables are correlated each other? Table 8 Critical values for r
(2) t test (Assume normal distribution) H 0 : ρ =0, H 1 : ρ ≠0
If P -value < α , then reject H 0 , conclude that
the population correlation coefficient is significantly different from 0.
The population correlation coefficient might not be 0.
7.3.2 Interval estimation
Assumption: X and Y follow a bi-variable normal distribution.
(1) hyperbolic tangent and its inverse
Hyperbolic tangent ( 双曲正切 )
Inverse of hyperbolic tangent ( 反双曲正切 )
approximately follows a normal distribution
Confidence interval of :
Taking a transformation of
Example 7.5 After getting , please find out a 95% confidence interval for population correlation coefficient .
It is useful to:
As well as measurement data
---- not follow a normal distribution;
or not sure about the distribution;
or not precisely measured
or X or Y are ordinal variables
7.4 Rank Correlation 7.4.1 Spearman ’ s rank correlation coefficient
Spearman ’ s rank correlation coefficient
sort (x 1 ,x 2 ,…,x n ), get rank p i for x i
sort (y 1 ,y 2 ,…,y n ), get rank q i for y i
n pairs of observations, (x 1 ,y 1 ), …, (x n ,y n )
Example 7.6 An etiology study on liver cancer has collected data on liver-cancer-specific death rate and the relative content of aflatoxin in certain food for 10 Counties. Putting the ranks into the formula of Spearman’s correlation coefficient
Table 9 Critical value for Spearman’s rank correlation coefficient 7.4.2 Hypothesis test for (1) Checking a special table (Table 9) P <0.02 and it is significant