Data Processing and Statistical Treatment: Spreads and Correlation

Equal Spreads but different
Averages
Equal Averages but
Different Spreads

Objectives
The student will be able to:
• identify the standard deviation of a data set.
• identify the variance of a data set.
• identify different correlations and correlation
techniques.

The average squared deviation from the mean of a set
of data is called the variance of the distribution. It is
sometimes called mean square since we divide the sum
of the squares by the total number of cases in the
distribution. Thus, the formula for variance of a sample
is given as
𝑠2 =
(𝑋𝑖 − 𝑋)2
𝑛

The Standard Deviation (SD) is
considered as the most useful index of
variability or dispersion. This measure
indicates how closely the scores are
clustered around the mean.
𝑆𝐷 =
(𝑋𝑖 − 𝑋)2
𝑛

a) Find the mean of the data.
b) Subtract the mean from each value.
c) Square each deviation of the mean.
d) Find the sum of the squares.
e) Divide the total by the number of items.
f) Take the square root of the variance.

Find the variance and
standard deviation
The math test scores of five students are:
92, 88, 80, 68 and 52.
1) Find the mean: (92+88+80+68+52)/5 = 76
2) Find the deviation from the mean:
92-76=16
88-76=12
80-76=4
68-76= -8
52-76= -24

3) Square the deviation from the
mean:
2
( 8) 64 
2
(16) 256
2
(12) 144
2
(4) 16
2
( 24) 576 
standard deviation
The math test scores of five
students are: 92,88,80,68 and 52.

standard deviation
The math test scores of five students
are: 92,88,80,68 and 52.
4) Find the sum of the squares of the
deviation from the mean:
256+144+16+64+576= 1056
5) Divide by the number of data
items to find the variance:
1056/5 = 211.2

standard deviation
The math test scores of five students
are: 92,88,80,68 and 52.
6) Find the square root of the
variance: 211.2 14.53
Thus the standard deviation of
the test scores is 14.53.

𝑆𝐷 =
(𝑋𝑖 − 𝑋)2
𝑛

𝑆𝐷 =
(𝑋𝑖 − 𝑋)2
𝑛
1. A teacher sets an exam for her pupils. He wants to summarise
the results the pupils attained as a mean and standard deviation.
Which standard deviation should be used?
2. A researcher has recruited males aged 45 to 65 years old for an
exercise training study to investigate risk markers for heart
disease (e.g., cholesterol). Which standard deviation would most
likely be used?
3. One of the questions on a national consensus survey asks
for respondents’ age. Which standard variation in all ages
received from the consensus?

Correlation is a measure of relationship between two or more paired
variables or two or more sets of data. It is sometimes called co-
variation because analysis is focused mainly on how two variables co-
vary or co-differ with each other.
* As a historical note, Pearson r is named
after Karl Pearson, the mathematician who
developed the said correlational formula.
Bivariate data is data in which two variables are measured on an
individual.
The response variable is the variable whose value can be explained
or determined based upon the value of the predictor variable.

The linear correlation coefficient or
Pearson product moment correlation
coefficient is a measure of the strength of
linear relation between two quantitative
variables. We use the Greek letter (rho) to
represent the population correlation
coefficient and r to represent the sample
correlation coefficient. We shall only present
the formula for the sample correlation
coefficient.
nextback

1. The linear correlation coefficient is always
between -1 and 1, inclusive. That is, -1 < r
< 1.
2. If r = +1, there is a perfect positive linear
relation between the two variables.
3. If r = -1, there is a perfect negative linear
relation between the two variables.
Properties of the Linear Correlation
Coefficient
nextback

4. The closer r is to +1, the stronger the
evidence of positive association between the
two variables.
5. The closer r is to -1, the stronger the
evidence of negative association between the
two variables.
Coefficient
nextback

back
6. If r is close to 0, there is evidence of no
linear relation between the two variables.
Because the linear correlation coefficient is a
measure of strength of linear relation, r
close to 0 does not imply no relation, just no
linear relation.
7. It is a unitless measure of association.
So, the unit of measure for x and y plays no
role in the interpretation of r.
Coefficient
next

A linear correlation
coefficient that implies a
strong positive or negative
association that is computed
using observational data
does not imply causation
among the variables.

The spearman rank-order coefficient t, or simply spearman
rho is a measure or correlation between two sets of ordinal
data. It is the most widely used among the rank
correlational techniques. To compute the spearman rho
(symbolized by the Greek letter 𝜌 which is read as "rho"), the
following formula is used:
𝜌 = 1 −
6 𝐷2
𝑁(𝑁2 −1)
Spearman Rank-Order
Correlation Coefficient

Spearman Rank-Order Correlation is a non-
parametric test that is used to measure the
degree of association between two variables.
It was developed by Spearman, thus it is
called Spearman Rank Correlation.

Other
Correlation
Techniques
Kendall’s Tau – it is a
measure of
correlation between
ranks. It can be
applied wherever the
Spearman rho is
applicable.
Kendall’s Coefficient of
Concordance – It is
used to determine the
relationship among
three or more sets of
ranks.

Point -Biserial Coefficient- it
is a special type of a
Pearson product-moment
correlation coefficient widely
used in test construction,
test validation and test
analysis. It is used when
one of the variables is
continuous and the other is
a dichotomous variable.
Biserial Correlation
coefficient It is also
used in test construction
,test validation and test
analysis like the point-
biserial correlation
coefficient.
Phi Coefficient- The phi
coefficient, sometimes
called fourfold
coefficient is used
when each of the
variables are
dichotomous.
Tetrachoric Correlation
coefficient- It is a
measure of correlation
between data that can
be reduced to two
dichotomies .

Multiple Regression – it
is a technique that
enables researchers to
determine a
correlational between a
criterion variable
(dependent) and the
best combination of two
or more predictor
variables (independent).
Coefficient of Multiple
Correlation – the
coefficient of multiple
correlations indicates the
strength of the
correlation between the
combination of the
predictor variables and
the criterion variable.
Coefficient of
Determination –
symbolized by 𝑟2, the
coefficient of
determination is the
square of the correlation
between one predictor
variable and a criterion
variable.
Partial Correlation –
whenever two or more
variables are correlated,
there may be a possibility
that yet other variables
may explain any
relationship that is found.

Factor Analysis – it is a
technique that allows a
researcher to determine
if many variables can be
described by a few
factors.
Path Analysis – it is
used to test the
possibility of a causal
connection among
three or more
variables.
Discriminant Function
Analysis – it is a
technique used in the
same way as the
multiple regression
analysis.

The t-test for correlation
The coefficient of correlation only describes the extent or
degree of relationship between two variables. To test whether
this coefficient of correlation is significant at a particular level,
say 5% or 1%, the t-test for correlation is used. To calculate this
t-statistic, the following formula is used:
𝑡 = 𝑟
𝑛 − 2
1 − 𝑟2
where 𝑟 = the correlation coefficient
between two variables X and
Y; and
𝑛 = the number of paired values
of X and Y.

Data Processing and Statistical Treatment: Spreads and Correlation

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to Data Processing and Statistical Treatment: Spreads and Correlation

Similar to Data Processing and Statistical Treatment: Spreads and Correlation (20)

Recently uploaded

Recently uploaded (20)

Data Processing and Statistical Treatment: Spreads and Correlation