1. Reporter: Evamae S. Pagaura
Professor: Dr. Emmylou A. Borja
Data Processing and
Statistical Treatment
2. Data Processing
1. Data are categorized based on the
objectives or purposes of the study.
2. Data are coded either numerically or
alphabetically.
3. Data are tabulated using a master
tabulation sheet and they are analyzed using
appropriate statistical tool.
3. Statistical Treatment
Statistical treatment of data is essential
in order to make use of the data in the right
form. Raw data collection is only one aspect
of any experiment; the organization of data is
equally important so that appropriate
conclusions can be drawn.
4. Descriptive Problems
Those problems that Descriptive statistics are
being used.
Ex. Profile questions and those involve mere
counting and tabulation.
Some commonly used descriptive statistics
include:
Frequency counts and percentages, averages
(mean, median, and mode) and spreads
(standard deviation and variance).
5. Inferential Problems
Those problems that may require hypothesis
and statistical test of significance. These
statistical tests are called Inferential
Statistics.
They are used to make inferences about a
population based on findings from a sample.
It can be categorized into parametric and
non-parametric statistics
6. Parametric Test
1. Data are of interval or ratio type;
2. Homogeneity of variance ( variances of
each group in comparison are equal); and
3. The population distribution from where
the samples are obtained is normal
7. Non-Parametric Test
1. Do not always depend on some specific
type of distribution like the normal curve.
2. It is also called as distribution-free
statistics.
3. Applied to both nominal and ordinal
data
8. The Normal Distribution
1. Considered as the most important
probability distribution.
2. Its graph is called the Normal Curve
3. The curve is asymptotic
4. Three Standard deviations on either side
of the curve will include practically all
of the cases.
9. Frequency Counts and
Percentages
-Are statistical tools which are usually used to
answer profile questions and those that involve
mere counting
- Results are presented in a frequency table.
- The frequency table consist of the following:
a. The variable under consideration
b. frequency
c. percentage
10. To determine the percentage per group of data, simply
divide the frequency of each group (fi) by the total
frequency (N)
Percentage =
fi
𝑁
x 100%
Ex: Distribution of the respondents by Year Level
Year Level Frequency Percentage
Freshmen 150 27.27
Sophomore 142 25.82
Junior 133 24.18
Senior 125 22.73
Total 550 100.00
11. Averages or Measures of
Central Tendency
Are measures that represent the typical
score in a distribution. The Three most
commonly used measures of central tendency
are the Mean (arithmetic mean), Median,
and Mode.
12. The Mean
-The most commonly used measure of central tendency.
- It is determined by adding up all the scores or values in
the distribution and then dividing this sum by the total
number of scores or values.
𝑋 =
𝑋
𝑛
Where:
X = sum of all the scores or values in the
distribution;
n = total number of scores in the distribution
13. The Median
-Is the midpoint of the distribution, the point below
and above of which 50 percent of the scores in the
distribution fall.
-To determine the median of a distribution, arrange
the data either in ascending or descending order.
- In a distribution that contains an odd number of
scores or values, median is the middlemost score.
- If the distribution contains an even number of
scores or values, the median is the point halfway
between the two middlemost scores or values.
14. The Mode
-Is the most frequent score or value in the
distribution.
- may or may not exist, even if it exists, it
may not be unique.
- it can be categorized as unimodal,
bimodal, or multimodal.
15. Spreads or Variability
-Two distributions may have identical means
but have different spread or variability. On the
other hand, two distributions may have
identical spread but have different means.
- Two measures of variability that are very
much related to each other are the standard
deviation and variance.
16. Standard Deviation
-Considered as the most useful index of
variability or dispersion.
-It indicates how closely the scores are clustered
around the mean
-The more spread out the distribution is, the
larger the standard deviation; and the closer the
scores or values to the mean
-The less spread out they are, the smaller the
standard deviation.
17. -We can calculate the standard deviation through
this formula:
Sd =
𝑋𝑖 − 𝑋 2
𝑛
∑ = “sum of”
𝑋𝑖 = the individual score or value; and
𝑋 = the mean score or value; and
n = the total number of scores or values in the
distribution.
18. The Variance
-The standard deviation squared is called
variance.
-The variance distribution is sometimes called
mean square since we divide the sum of the
squares by the total number of cases in the
distribution.
𝑠2
=
𝑋𝑖 − 𝑋 2
𝑛
19. The Correlation
-Is a measure of relationship between two
or more paired variables or two or more
sets of data.
-Also called as covariation
-Correlation coefficient which represents
the extent or degree of relationship
between two variables may be positive,
negative, or zero.
20. Pearson Product-Moment
Correlation Coefficient
-Is a measure of relationship between two variables that
are usually of the interval type of data.
Ex. If one tries to determine the relationship between
students achievement in mathematics and their
achievement in physics.
Pearson r formula:
r =
𝑁 𝑋𝑌 − 𝑋 𝑌
𝑁 𝑋2 − 𝑋 2 𝑁 𝑌2 − 𝑌 2
21. Where:
n = the number of paired values X and Y
𝑋 = sum of all X values
𝑌 = sum of all Y values
𝑋 2
= square of the sum of all X values
𝑌 2
= square of the sum of all Y values
𝑋2
= sum of all squares of all X values
𝑌2
= sum of all squares of all Y values
𝑋𝑌 = sum of all the products of X and Y
22. Spearman Rank-Order
Correlation Coefficient
-Is a measure of correlation between two sets of
ordinal data.
-It is most widely used among the rank
correlational techniques
Spearman rho Formula:
ρ = 1 -
6 𝐷2
𝑁 𝑁2 −1
23. Where:
D = the difference between paired ranks;
𝐷2
= the sum of the squared differences
between ranks; and
N = number of the paired ranks
24. Other Correlational
Techniques
1. Kendall’s Tau – It is measure of correlation
between ranks. It can be applied wherever
the spearman rho is applied.
2. Kendall’s Coefficient of Concordance – It
is used to determine the relationship among
three or more sets of ranks
25. 3. Point-Biserial Correlation Coefficient - It is a
special type of a Pearson product-moment
correlation coefficient. It is used when one of
the variables is continuous and the other is a
dichotomous variable
4. Biserial Correlation Coefficient – It is also
used in test construction, test validation, and
test analysis like the Point-Biserial Correlation
Coefficient. However, this is a less reliable
measure of correlation since it is only an
estimate of Pearson r.
26. 5. Phi Coefficient – sometimes called fourfold
coefficient is used when each of the variables are
dichotomous.
6. Tetrachoric Correlation Coefficient – it is a
measure of correlation between data that can be
reduced into two dichotomies. It functions like
the Phi Coefficient.
7. Partial Correlation – It is used to remove the
effect of one variable on the correlation between
two other variables.
27. 8. Multiple Regression – It is a technique that
enables researchers to determine a correlation
between a criterion variable (dependent) and the
best combination of two or more predictor
variables (independent).
9. Coefficient of Multiple Correlation – The
coefficient of multiple correlation indicates the
strength of the correlation between the
combination of the predictor variables and the
criterion variable.
28. 10. Coefficient of Determination –
symbolized by 𝑟2 , the coefficient of
determination is the square of the
correlation between one predictor variable
and a criterion variable. This value
indicates percentage of the variability
among the criterion values that can be
attributed to differences in the values on the
predictor variable.
29. 11. Discriminant Function Analysis – It is a
technique used in the same way as the multiple
regression analysis.
12. Factor Analysis – It is a technique that
allows a researcher to determine if many
variables can be described by a few factors. It
involves search for clusters of variables , all of
which are correlated with each other.
30. 13. Path Analysis – It is used to test the possibility of a
causal connection among three or more variables. Fraenkel
and Wallen (1993) enumerated the four basic steps of Path
Analysis:
1. A theory that links several variables is formulated to
explain a particular phenomenon of interest.
2. The variables specified by the theory are then
measured in some way.
3. Correlation coefficients are computed to indicate the
strength of the relationship between each of the pairs of
variables postulated in the theory.
4. Relationship among the correlation coefficients are
analyzed in relation to the theory.
31. The t-test for Correlation
-Coefficient correlation only describes the
extent or degree of relationship between
two variables.
-To test whether this coefficient of
correlation is significant at a particular
level.
32. To Calculate the t-test for correlation, the
following formula is used:
t = r
𝑛 −2
1 − 𝑟2
Where:
r = the correlation coefficient between
two variables X and Y; and
n = the number of paired values of
X and Y.
33. The Test for Comparison
The t-test for difference between Means
- The t-test is a parametric test used to determine
whether a difference between the means of two groups
is significant.
- It is the ratio between the mean difference between
two groups and the standard error of difference
between means.
- Two forms of a t-test, t-test for independent means
and t-test for dependent means.
34. The t-test for Independent Means
-Is used to compare the mean scores of two
independent or uncorrelated groups or sets of
data.
-The formula for t-test for independent means is
given:
t =
𝑋1−𝑋2
𝑠1
2
𝑛1
+
𝑠2
2
𝑛2
35. Where:
𝑋1 = the mean of the first group;
𝑋2 = the mean of the second group;
𝑠1
2
= the variance (standard deviation
squared) of the first group;
𝑠2
2
= the variance (standard deviation
squared) of the second group;
𝑛1 = the number of cases in the first group;
𝑛2 = the number of cases in the second group.
36. The t-test for Dependent Means
- Is used to compare the mean scores of the same group
before and after a treatment is given to see if there is
any observed gain, or when the research design involves
two matched groups.
- Is also used when the same subject receive two
different treatments in a study.
- The formula for the t-test for the dependent means is
given by: t =
𝑋1−𝑋2
𝑠1
2
𝑛1
+
𝑠2
2
𝑛2
−2𝑟𝑆𝑋1
𝑆𝑋2
37. Where:
𝑋1 = the mean of the first group;
𝑋2 = the mean of the second group;
𝑠1
2 = the variance (standard deviation squared) of the first
group;
𝑠2
2 = the variance (standard deviation squared) of the second
group;
𝑛1 = the number of cases in the first group;
𝑛2 = the number of cases in the second group.
r = correlation between the first and second group
𝑆𝑋1
= the standard error of the mean of the first group
𝑆𝑋2
= the standard error of the mean of the second group
38. Thank You so Much!
Reported by: Mr Vige Y. Alvarado