Statistical Techniques in Data Analysis

USING STATISTICAL
TECHNIQUES IN
ANALYZING DATA
LESSON 27

INTRODUCTION
There are many instances in your life when you try to
determine if some characteristics are related with
each other. On a higher level, you also want to
measure the degree of their relationship or
association. You usually associate height and weight,
budget and expenses and other aspects in life which
may be related with one another.

The Scatter Diagram
Plotting graphically the values of the correlated variables means
placing one variable on the x-axis and the other on the y-axis The
scatter diagram gives you a picture of the relationship between
variables.

Example of a Scatter Diagram
0
5
10
15
20
25
30
35
0 10 20 30 40 50 60
Grades

0
5
10
15
20
25
30
0 10 20 30 40 50 60
Grades

0
5
10
15
20
25
0 10 20 30 40 50 60
Grades

Types of Correlation
1. Simple Correlation
◦ This is a relationship between two variables. The relationship between an
independent variable and a dependent variable is usually measured.
◦ A. Linear Correlation
◦ This means that a change in one variable is at a constant rate with respect to
the change in the second variable. The correlation between the variables
may either be showing direct or inverse relationship.

Types of Correlation
2. Curvilinear Correlation
◦This means that a change in one variable is not at a fixed rate. It
may be increasing or decreasing with respect to the change in the
other variable.

The Coefficient of Correlation
To obtain the quantitative value of the extent of the
relationship between two sets of items, it is necessary to
calculate the correlation coefficient.
The values of the coefficient correlation ranges between +1 to
-1.
Zero represents no relationship.

The Pearson Product Moment
Correlation Coefficient (Pearson r)
It is derived by Karl Pearson.
It measures the linear relationship between two variables.
Therefore, to be able to determine linearity, it is important that a
scatter diagram be constructed prior to the computation of the
Pearson r.

Pearson r Formula:
𝑟 =
𝑛 𝑥𝑦− 𝑥 𝑦
[𝑛 𝑥2−( 𝑥)
2
][𝑛 𝑦2−( 𝑦)
2
]

Example 1:
The scores of ten randomly selected senior high school
students on the mathematical portion of the National
Admission Test (NAT) and the mathematical ability

Find the coefficient of correlation of the
following
STUDENT
NAME
X Y 𝑿 𝟐
𝒀 𝟐 XY
A 5 6
B 7 15
C 9 16
D 10 12
E 11 21
F 12 22
G 15 8
TOTAL 𝑋=____ 𝑌 =___ 𝑋2=___ 𝑌2=____ 𝑋𝑌 =___

INTERPRETATION OF PEARSON R
0.00 ± 0.20 𝑑𝑒𝑛𝑜𝑡𝑒𝑠 𝑛𝑒𝑔𝑙𝑖𝑔𝑖𝑏𝑙𝑒 𝑐𝑜𝑟𝑟𝑒𝑙𝑎𝑡𝑖𝑜𝑛
0.21 ± 0.40 𝑑𝑒𝑛𝑜𝑡𝑒𝑠 𝑙𝑜𝑤 𝑐𝑜𝑟𝑟𝑒𝑙𝑎𝑡𝑖𝑜𝑛
0.41 ± 0.70 𝑑𝑒𝑛𝑜𝑡𝑒𝑠 ℎ𝑖𝑔ℎ 𝑐𝑜𝑟𝑟𝑒𝑙𝑎𝑡𝑖𝑜𝑛
0.71 ± 1.00 𝑑𝑒𝑛𝑜𝑡𝑒𝑠 𝑣𝑒𝑟𝑦 ℎ𝑖𝑔ℎ 𝑐𝑜𝑟𝑟𝑒𝑙𝑎𝑡𝑖𝑜𝑛

SPEARMAN RANK ORDER COEFFICIENT
OF CORRELATION
The statistics being used on ranks or position is the Spearman Rank Correlation
Coefficient represented here by 𝒓 𝒔. It is a measure of relationship between two
variables by ranking the items or individuals under study according to their
position. It represents the extent to which the same individuals or events occupy
the same relative position on two variables.
Formula: 𝒓 𝒔 = 𝟏 −
𝟔 𝑫 𝟐
𝒏(𝒏 𝟐−𝟏)
where: 𝒓 𝒔 = Spearman rank correlation coefficient
◦ D = difference between the two ranks of an
individual in the variables studied.
◦ n = number of individuals

Find the coefficient correlation of the following data.
Students 𝑺 𝑬 𝑺 𝑷 𝑹 𝑬 𝑹 𝑷 𝑫 𝑫 𝟐
1 48 50
2 35 41
3 48 52
4 36 47
5 53 36
6 48 55
7 32 48
8 30 36
9 56 33
10 42 39

QUIZ 2: Find the coefficient correlation of the following data.
Students 𝑺 𝑬 𝑺 𝑷 𝑹 𝑬 𝑹 𝑷 𝑫 𝑫 𝟐
1 90 89
2 78 78
3 70 65
4 78 92
5 80 94
6 78 95
7 84 90
8 80 78
9 82 78
10 75 90

SIMPLE LINEAR REGRESSION ANALYSIS
Linear regression is the simplest and commonly used statistical
measure for prediction studies. It is concerned with finding an
equation that uses the known values of one or more variables,
called the independent or predictor variables, to estimate the
unknown value of quantitative variable called the dependent or
criterion.
It is a prediction when a variable (Y) is dependent on a second
variable (X) based on the regression equation of a given set of data.

Three major uses of regression analysis
1. Causal analysis –establishes the possible causation of changes in one variable by
changes in other variable.
2.Forecasting an Effect –predicts or estimate the value of a variable given the
values of other variable.
3. Linear Trend Forecasting –imposes a line best fit to time series historical model.
The general form of the linear function is 𝑌 = 𝑎 + 𝑏𝑥
Where: a = is called the Y-intercept of the line
◦ b= is the slope of the line called regression (the rate of change of Y per unit change
in X)

Example
6 randomly selected Grade 11 students took a 50-item mathematics
aptitude test before they began their course in Statistics and Probability
subjects.
1. What linear equation best predicts performance(based on first grading
test scores) in Statistics and Probability based on performance in the
mathematics aptitude?
2. If a student made a score of 45 on the math aptitude test, what score
would we expect the student to obtain in Statistics and Probability.
3. How well does the regression equation fit the data?

Test 1 (X) Test 2 (Y) 𝑿 𝟐 𝒀 𝟐 𝑿𝒀
38 25
35 20
30 17
28 15
25 12
18 15
𝑋 =___ 𝑌 =____ 𝑋2
= 𝑦2
= 𝑥𝑦 =

REGRESSION ANALYSIS
𝐼𝑛 𝑡ℎ𝑒 𝑓𝑜𝑟𝑚𝑢𝑙𝑎: 𝑦′
= 𝑎 + 𝑏𝑥
◦Where: 𝑏 =
𝑥𝑦−
𝑥 𝑦
𝑁
𝑥
2
−
( 𝑥)2
𝑁
◦
◦ 𝑎 = 𝑦′
− 𝑏𝑥 or 𝑎 =
( 𝑦)( 𝑥2)−( 𝑥)( 𝑥𝑦)
𝑁 𝑥
2
−( 𝑥)2
◦

Regression Equation:
xy  
Where:
y = the predicted y value
α = the intercept
β = the slope

Regression Equation:
The above equation can be solved:
 
  
  


 22
xxn
yxxyn

xy  
Where: 𝑦 =mean in y
𝑥 =mean in x𝑦 = 𝛼 + 𝛽 𝑥

Consider the table below. The test scores in statistics and
probability of Grade 11 students in Mainit NHS. Find the
equation of the regression line then predict the grades in
statistics and probability if the test scores are 60 and 75.
RUBRICS 1 RUBRICS 2 Procedures

Activity 1: Seek ye First
25
100
75
50
y
X
302010 40 50 60 8070 90
Grades
Test Scores

Guide Questions:
◦What did you represent to vertical axis?
Horizontal axis?
◦What is your process in plotting points on the
x-y plane
◦Base on the results, describe the diagram
formed by the points plotted.
Procedures 2

Activity 2: Find me out????????
Students x y x2 xy
1 58 87
2 52 86
3 65 89
4 45 86
5 49 86
6 50 85
7 45 83
8 47 76
9 48 79
10 48 81
3364
2704
4214
3870
5785
4472
5046
4225
2025
2401
2500
2025
2209
2304
2304
4250
3735
3572
3792
3888
______
_______


x
x
_______2
x
______
 xy
______
_______


y
y507
50.7
838
83.8
26061
42624

Solving for the Regression Equation:
Students x y x2 xy
7.50
507


x
x
8.83
838


y
y 260612
x
42624
 xy
10n
 
  
  


 22
xxn
yxxyn

    
   2
507061,2610
838507624,4210



561,3
374,1

39.0
xy  
  7.5039.08.83 
03.64
Thus, the regression
equation is:
xy 39.003.64 

Activity 1: Seek ye First
25
100
75
50
y
X
302010 40 50 60 8070 90
Grades
Test Scores
xy 39.003.64 
Guide Questions

Guide Questions
1. What is the value of α as the intercept?____
2. What is the value of β as the slope of the line?______
3. Write the regression equation.___________
4. State the relationship between the grades in statistics y and scores in the test
x._______ Why? Explain mathematically.
◦ The Grades is directly proportional to Scores. It is because the slope β > 0 or
the slope is positive.
64.03
0.39
xy 39.003.64 

5. Give your interpretation about the relationships between
x and y variables base on the results.
◦In every increase of the score by 1, there is a
corresponding increase of grade by 0.39
6. Predict your grades if you got a score of 60, a
score of 75. __________, ____________
87.43 93.28

CHI-SQUARE(𝑥2
)
The Chi square is the most commonly used method of comparing
proportions. It is particularly useful in tests evaluating a relationship
between nominal or ordinal data. Typical situations or settings are
cases where persons, events or objects are grouped in two or more
nominal categories such as “Yes-No” responses, “Favor-Against-
Undecided” or class “A, B, C or D”.

CHI-SQUARE(𝑥2
)
Chi-square analysis compares the observed frequencies of the responses with
the expected frequencies. It is a measure of actual divergence of the observed
and expected frequencies. It is given by the formula:
𝑋 =
Σ(𝐹𝑜 − 𝐹𝑒)2
𝐹𝑒
Where: 𝐹𝑜 = 𝑜𝑏𝑠𝑒𝑟𝑣𝑒𝑑 𝑛𝑢𝑚𝑏𝑒𝑟 𝑜𝑓 𝑐𝑎𝑠𝑒𝑠
𝐹𝑒 = 𝑒𝑥𝑝𝑒𝑐𝑡𝑒𝑑 𝑛𝑢𝑚𝑏𝑒𝑟 𝑜𝑓 𝑐𝑎𝑠𝑒𝑠
𝐹𝑒 =
(𝑟𝑜𝑤 𝑡𝑜𝑡𝑎𝑙)(𝑐𝑜𝑙𝑢𝑚𝑛 𝑡𝑜𝑡𝑎𝑙)
𝑁(𝑔𝑟𝑎𝑛𝑑 𝑡𝑜𝑡𝑎𝑙)

Illustration
Consider the nomination of three (3) presidential candidates of a political party. A,
B and C. The chairman wonders whether or not they will be equally popular among
the members of the party. From this the hypothesis of equal preference, a random
sample of 315 were selected and interviewed which one of the three candidates
they prefer.
The following are the results of the survey:
Candidates Frequency
A 98
B 115
C 102

Calculating the 𝑋2
𝑣𝑎𝑙𝑢𝑒
Candidate 𝑭 𝑶 𝑭 𝒆
A 98 105
B 115 105
C 102 105
𝑋2
=
Σ(𝐹𝑜 − 𝐹𝑒)2
𝐹𝑒
𝑋 =
(98−105)2
105
+
(115−105)2
105
+
(102−105)2
105
=1.505

For chi square significance, use the table
value
Critical value = 5.991
Decision rule: Reject 𝐻 𝑜 𝑖𝑓 𝑋 𝑐𝑜𝑚𝑝𝑢𝑡𝑒𝑑 >
5.991, 𝑜𝑡ℎ𝑒𝑟𝑤𝑖𝑠𝑒 𝑑𝑜 𝑛𝑜𝑡 𝑟𝑒𝑗𝑒𝑐𝑡 𝐻 𝑜
Conclusion: Since 1.505<5.991, do not reject 𝐻 𝑜.
There is no sufficient evidence or reason to reject the null
hypothesis that the frequencies in the population are equal.

Chi-Square as a Test Independence: Two Variables
Chi-Square can also be used to test the significance of relationship
between two variables when data are expressed in terms of
frequencies of joint occurrence.
𝐹𝑒 =
𝑁(𝑔𝑟𝑎𝑛𝑑 𝑡𝑜𝑡𝑎𝑙)

Test of Relationship
Chi-Square Test for Independence.
◦This is used when data are expressed in terms of frequencies or
percentage(nominal variable).
◦Formula:
◦𝑥2
=
(𝑂−𝐸)2
𝐸
[df=(r-1)(c-1)
◦Where: 𝐸 =
𝑔𝑟𝑎𝑛𝑑 𝑡𝑜𝑡𝑎𝑙

Example
Suppose one want to know if there is a relationship between gender
and school choice. A sample of 100 female and 100 male freshman
students were asked individually for their school choice. Test the null
hypothesis of no significant relationship between the students gender
and school choice at 5% level of significance.

SCHOOL
CHOICE
GENDER
FEMALE MALE TOTAL
PUBLIC 42 𝐶1 65 𝐶3 107
PRIVATE 58 𝐶2 35 𝐶4 93
TOTAL 100 100 200

SCHOOL
CHOICE
GENDER
FEMALE MALE TOTAL
PUBLIC 42 54 65 54 107
PRIVATE 58 47 35 47 93
TOTAL 100 100 200

Calculating the 𝑋2
𝑣𝑎𝑙𝑢𝑒
𝑋2
=
(42−54)2
54
+
(58−47)2
47
+
(65−54)2
54
+
(435−47)2
47
=10.53
Degree of freedom =(row-1)(column-1) =(2-1)(2-1)=1
Critical Value = 3.841
Since the computed value 10.53 is greater than tabular value
3.841.
Decision: There is a significant relationship between the
students gender and school choice.

One Sample z-Test
This test is used when we have a random sample and we
want to test. If it is significantly different from a population
mean or we compared a single sample mean( 𝑋) to a known
or hypothesized population mean(𝜇). This test can be used
only if the background assumptions are satisfied such as
Sample observations

ONE SAMPLE Z-TEST formula
𝑧 =
𝑋−𝜇 𝑜
𝑠
𝑛
where: 𝑠 =
(𝑥− 𝑥)2
𝑛−1
◦ 𝑋 = sample mean
◦𝜇 𝑜= population mean
◦𝑠=population standard deviation
◦N-number of samples

Example:
A company who make cookies, claims that its product
have a mean life span of 7 days with standard
deviation of 2 days. If a random sample of 50 cookies
is tested and one found to have a mean life span for
only 4 days. Test the claim at the 5% level of
significance.

Computational Procedure
1. Define the Null and Alternative Hypothesis.
◦𝐻 𝑜: 𝑢 = 7 𝑎𝑛𝑑 𝐻𝐴 ≠ 7
◦2. State Alpha
◦ 𝛼 = 0.05
◦ 3. State Decision Rule
◦ One-tailed Test: 𝑧 > 𝑧 𝑎; 𝑅𝑒𝑗𝑒𝑐𝑡 𝐻 𝑜
◦ Two-Tailed Test: 𝑧 >
𝑧 𝑎
2
; 𝑅𝑒𝑗𝑒𝑐𝑡 𝐻 𝑜

4. Calculate Test Statistic
𝑧 =
𝑋−𝜇 𝑜
𝜎
𝑛
=
4−7
2
50
= -10.6066
5. State Results (use z table to get the critical value)
𝑧 𝑎
2
→
𝑍0.05
2
→ 𝑍0.025 = 1.96
−10.6066 > 1.96, 𝐷𝑒𝑐𝑖𝑠𝑖𝑜𝑛: 𝑅𝑒𝑗𝑒𝑐𝑡 𝐻 𝑜
6. Conclusion: Therefore the company who makes cookies
have mean life span of not equal to 7 days.

Example :1
A researcher wishes to see if the mean number of days
that a basic, low-price, small automobile sits on a dealer’s
lot is 29. A sample of 30 automobile dealers has a mean
of 30.1 days for basic, low-price, small automobiles. At a
0.05, test the claim that the mean time is greater than 29
days. The standard deviation of the population is 3.8
days.

Example :2
The Medical Rehabilitation Education Foundation reports that
the average cost of rehabilitation for stroke victims is $24,672. To
see if the average cost of rehabilitation is different at a particular
hospital, a researcher selects a random sample of 35 stroke
victims at the hospital and finds that the average cost of their
rehabilitation is $26,343. The standard deviation of the
population is $3251. At a 0.01, can it be concluded that the
average cost of stroke rehabilitation at a particular hospital is
different from $24,672?

ONE SAMPLE T-TEST
The One sample t-test is used when we want to know whether the
difference between a sample mean and the population mean is large
enough to be statistically significant, that is if unlikely to have occurred
by chance.
This test can be used only if the background assumptions are satisfied
such as the population mean and standard deviation must be known and
the test statistics should follows a normal distribution.

ONE SAMPLE T-TEST formula
𝑡 =
𝑋−𝜇 𝑜
𝜎
𝑛
where: 𝜎 =
(𝑥− 𝑥)2
𝑁
◦ 𝑋 = sample mean
◦𝜇 𝑜= population mean
◦𝜎=population standard deviation
◦N=population
◦N-sample

Example
A random sample of 10 grade 11 students has grades in English, where
marks range from 1 (worst) to 6 (excellent). The grade point average
(GPA) of all grade 11 students as of the last six years is 4.5. Is the GPA of
the 10 grade 11 students different from the populations GPA? Use 0.05
level of significance.
Student 1 2 3 4 5 6 7 8 9 10
Grade Points 5 6 4.5 5 5 6 5 5 5 5.5

◦𝐻 𝑜: 𝑢 = 4.5 𝑎𝑛𝑑 𝐻𝐴 ≠ 4.5
◦2. State Alpha
◦3. df = n-1 =10-1=9
◦ 𝛼 = 0.05
4. State Decision Rule
◦ One-tailed Test: 𝑡 > 𝑧 𝑎; 𝑅𝑒𝑗𝑒𝑐𝑡 𝐻 𝑜
◦ Two-Tailed Test: 𝑡 >
𝑧 𝑎
2

𝑡 =
𝑋−𝜇 𝑜
𝑠
𝑛
=
5.2−4.5
0.4831
10
=4.583
𝑡 𝑎
2
𝑛−1
→
𝑡0.05
10−1
→ 𝑡0.0025 = 2.263
4.583 > 2.262, 𝐷𝑒𝑐𝑖𝑠𝑖𝑜𝑛: 𝑅𝑒𝑗𝑒𝑐𝑡 𝐻 𝑜
6. Conclusion: Therefore the grade point average of the 10
pupils is different from the populations GPA

Example 1
The average depth of the Hudson Bay is 305 feet. Climatologists
were interested in seeing if the effects of warming and ice melt
were affecting the water level. Twenty-five measurements over a
period of weeks yielded a sample mean of 306.2 feet. The
population variance is known to be 3.57. Can it be concluded at
the 0.05 level of significance that the average depth has
increased? Is there evidence of what caused this to happen?

Example 2
A physician claims that joggers’ maximal volume oxygen
uptake is greater than the average of all adults. A sample
of 15 joggers has a mean of 40.6 milliliters per kilogram
(ml/kg) and a standard deviation of 6 ml/kg. If the average
of all adults is 36.7 ml/kg, is there enough evidence to
support the physician’s claim at a 0.05?

Example 3
The average local cell phone call length was reported to be 2.27
minutes. A random sample of 20 phone calls showed an average
of 2.98 minutes in length with a standard deviation of 0.98
minute. At a 0.05 can it be concluded that the average differs
from the population average?

Independent Sample z-test: Equal
Variance Not Assumed
It is used for testing two means when the variance is
known and T-test if the variance is unknown.
If Equal Variances Assume: 𝜎1
2
= 𝜎2
2
= σ
𝑧 =
(𝑋1−𝑋2)−(𝜇1− 𝜇2)
𝜎1
2
𝑛1
+
𝜎2
2
𝑛2

Independent Sample z-test: Equal
Variance Not Assumed
If Equal Variances Not Assume: 𝜎1
2
≠ 𝜎2
2
𝑧 =
(𝑋1−𝑋2)−(𝜇1− 𝜇2)
𝜎1
2
𝑛1
+
𝜎2
2
𝑛2

The basic format for hypothesis testing
Step 1 State the hypotheses and identify the claim.
Step 2 Find the critical value(s).
Step 3 Compute the test value.
Step 4 Make the decision.
Step 5 Summarize the results.

Example
Employees at public universities work 11.3 hours per week on
the average with a standard deviation of 9.5. At private
universities, the average working time for employees is 9.7
hours, with a standard deviation of 8.9 hours. The sample size for
each is 500. Is there a significant difference between the average
hours of the public and private universities? Perform a
hypothesis testing using 5% level of significance to find out.

◦𝐻 𝑜: 𝑃𝑢𝑏𝑙𝑖𝑐 = 𝑃𝑟𝑖𝑣𝑎𝑡𝑒 𝑎𝑛𝑑 𝐻 𝑎: 𝑃𝑢𝑏𝑙𝑖𝑐 ≠ 𝑃𝑟𝑖𝑣𝑎𝑡𝑒
◦2. State Alpha : 𝛼 = 0.05
◦3. State Decision Rule
◦ One-tailed Test: 𝑧 > 𝑧 𝑎; 𝑅𝑒𝑗𝑒𝑐𝑡 𝐻 𝑜
◦ Two-Tailed Test: 𝑧 >
𝑧 𝑎
2

𝑧 =
(𝑋1−𝑋2)−(𝜇1− 𝜇2)
𝜎1
2
𝑛1
+
𝜎2
2
𝑛2
=
11.3−9.7 −0
9.5
500
+
8.9
500
=1.9444
𝑡 𝑎
2
𝑛−1
→
𝑡0.05
10−1
→ 𝑡0.0025 = 1.96
1.9444 < 1.96, 𝐷𝑒𝑐𝑖𝑠𝑖𝑜𝑛 ∶ 𝐴𝑐𝑐𝑒𝑝𝑡 𝐻 𝑜
6. Conclusion : Therefore, there is no significant difference between the
average hours of the public and private universities.

Example 2
A survey found that the average hotel room rate in New Orleans is
$88.42 and the average room rate in Phoenix is $80.61. Assume that the
data were obtained from two samples of 50 hotels each and that the
standard deviations of the populations are $5.62 and $4.83 respectively.
At a 0.05, can it be concluded that there is a significant
difference in the rates?

Independent Sample T-Test: Equal
Variance Assumed
The independent- measures hypothesis test allows researchers to
evaluate or to compare the mean difference between two populations
using the data from two separate samples. Generally, 𝜎2
is unknown
and is being estimated from the data. Hence, the t-test is used.
If Equal Variances Assume: 𝑠1
2
= 𝑠2
2
𝑡 =
(𝑋1−𝑋2)−(𝜇1− 𝜇2)
𝜎1
2
𝑛1
+
𝜎2
2
𝑛2

Independent Sample T-Test: Equal
Variance Assumed
If Equal Variances Not Assume: 𝜎1
2
≠ 𝜎2
2
𝑡 =
(𝑋1−𝑋2)−(𝜇1− 𝜇2)
𝜎1
2
𝑛1
+
𝜎2
2
𝑛2
Remember degree of freedom: df=𝑛1 + 𝑛2 − 2

Example
Suppose we put people on 2 diets “the fruit diet and the bread
diet”. Participants are randomly assigned to either 7 days of
eating exclusively fruits or 7 days of exclusively eating bread. At
the end of the week, we measure the weight gain of each
participant. Does bread diet cause more gain weight compared
to fruit diet? Test the claim using 10% level of significance.
𝑿 𝟏∶: 𝑭𝒓𝒖𝒊𝒕 𝑫𝒊𝒆𝒕 3 4 4 4 5 6
𝑿 𝟐:: 𝑩𝒓𝒆𝒂𝒅 𝑫𝒊𝒆𝒕 1 2 2 2 3 4

ONE WAY ANALYSIS OF VARIANCE
One way analysis of variance is used when you want to compare the
means of more than two groups. This test can be used only if the
background assumptions are satisfied such that it has independent
random samples, population are normal and population variance are
equal.
F=
𝑀𝑆 𝐵
𝑀𝑆 𝑤
𝑆𝑆 𝐵 = 𝑛
𝑖=1
𝑘
(𝑦1 − 𝑦)2
𝑎𝑛𝑑 𝑆𝑆 𝑤 =
𝑖
𝑘
𝑖
𝑖=1
𝑘
(𝑦𝑖𝑗 − 𝑦𝑖)2
Where: 𝑀𝑆
𝐵=
𝑆𝑆 𝐵
𝑘−1
𝑀𝑆 𝑤 =
𝑆𝑆 𝑤
𝑁 − 𝑘

Summary Table for one way-Anova
Source Sum of Squares Degrees of
Freedom
Variance
Estimate
F ratio
Between 𝑆𝑆 𝐵 K-1 𝑀𝑆
𝐵=
𝑆𝑆 𝐵
𝑘−1
Within 𝑆𝑆 𝑤 N-K 𝑀𝑆 𝑤
=
𝑆𝑆 𝑤
𝑁 − 𝑘
𝑀𝑆 𝐵
𝑀𝑆 𝑤
Total 𝑆𝑆𝑡 = 𝑆𝑆 𝐵 + 𝑆𝑆 𝑤 N-1
Where: 𝑆𝑆 𝐵 = 𝑛 𝑖=1
𝑘
(𝑦1 − 𝑦)2
𝑎𝑛𝑑 𝑆𝑆 𝑤 = 𝑖
𝑘
1 𝑖=1
𝑘

Examples
A teacher is concerned about the level of knowledge possessed by PUP
students regarding Philippine history. Students completed a senior high
school level standardized history item. Academic major of the students
was also recorded. Data in terms of percent correct response is recorded
below for 24 hours. Is there a significant difference between the levels of
knowledge possessed by PUP students regarding Philippines history
grouped when grouped according to their academic major? Compute the
appropriate test for the data provided below and used 0.05 level of
significance.

EDUCATION BUSINESS
MANAGEMENT
BEHAVIORAL SOCIAL
SCIENCE
ENGINEERING
63 72 42 81
79 49 52 57
78 64 30 87
56 68 83 64
67 39 22 29
47 78 71 30

◦ 𝐻 𝑜: 𝐸𝑑𝑢𝑐𝑎𝑡𝑖𝑜𝑛 = 𝐵𝑢𝑠𝑖𝑛𝑒𝑠𝑠 = 𝐵𝑒ℎ𝑎𝑣𝑖𝑜𝑟𝑎𝑙 = 𝐸𝑛𝑔𝑖𝑛𝑒𝑒𝑟𝑖𝑛𝑔
◦ 𝐻 𝑎: 𝐴𝑡 𝑙𝑒𝑎𝑠𝑡 𝑡𝑤𝑜 𝑜𝑓 𝑡ℎ𝑒 𝑚𝑒𝑎𝑛𝑠 𝑜𝑓 𝐸𝑑𝑢𝑐𝑎𝑡𝑖𝑜𝑛, 𝐵𝑢𝑠𝑖𝑛𝑒𝑠𝑠, 𝐵𝑒ℎ𝑎𝑣𝑖𝑜𝑟𝑎𝑙
◦ 𝑎𝑛𝑑 𝐸𝑛𝑔𝑖𝑛𝑒𝑒𝑟𝑖𝑛𝑔 𝑎𝑟𝑒 𝑛𝑜𝑡 𝑒𝑞𝑢𝑎𝑙
◦ 2. State Alpha : 𝛼 = 0.05
◦ 3. Degrees of freedom = 𝑑 𝑓1 = 𝑘 − 1 = 4 − 1 = 3 (within groups)
◦ 𝑑 𝑓2 = 𝑛 − 𝑘 = 24 − 4 = 20 (between groups)
◦ 3. State Decision Rule
◦ One-tailed Test: 𝑓 > 𝑓𝑎; 𝑅𝑒𝑗𝑒𝑐𝑡 𝐻 𝑜
◦ Two-Tailed Test: 𝑓 >
𝑓𝑎
2

EDUCATION
BUSINESS
MANAGEMENT
BEHAVIORAL
SOCIAL SCIENCE
ENGINEERING Total
63 72 42 81
79 49 52 57
78 64 30 87
56 68 83 64
67 39 22 29
47 78 71 30
Mean 65.00 61.67 50.00 58.00 58.67

Computing the Sum of Squares
1. 𝑆𝑆 𝐵 = 𝑛 𝑖=1
𝑘
(( 𝑦 − 𝑦)
2
= 6 65 − 58.67 + 6(61.67 − 58.67)+
6 50 − 58.67 + 6 58 − 58.67 = 748
2. 𝑆𝑆 𝑤 = 𝑖
𝑘
1 𝑖=1
𝑘
(𝑦𝑖𝑗 − 𝑦𝑖)2 = 778 + 1093.33 + 2782 + 3032 = 7685.333
𝑆𝑆 𝑒𝑑𝑢𝑐𝑎𝑡𝑖𝑜𝑛 = (63 − 65)2+ (79 − 65)2+ (78 − 65)2+ (56 − 65)2+
(67 − 65)2
+(47 − 65)2
=778
𝑆𝑆 𝑏𝑢𝑠𝑖𝑒𝑛𝑒𝑠𝑠 = (72 − 61.67)2+ (49 − 61.67)2+ (64 − 61.67)2+ (68 − 61.67)2+
(39 − 61.67)2
+(78 − 61.67)2
=1093.33
𝑆𝑆𝑠𝑜𝑐𝑖𝑎𝑙 𝑠𝑐𝑖𝑒𝑛𝑐𝑒 = (42 − 50)2+ (52 − 50)2+ (30 − 50)2+ (83 − 50)2+
(22 − 50)2+(71 − 50)2=2782
𝑆𝑆 𝑒𝑛𝑔𝑖𝑛𝑒𝑒𝑟𝑖𝑛𝑔 = (81 − 58)2+ (57 − 58)2+ (87 − 58)2+ (64 − 58)2+
(29 − 58)2+(30 − 58)2=3032

Summary Table for one way-Anova
Source Sum of Squares Degrees of
Freedom
Variance
Estimate
F ratio
Between 748 3 748
3
= 249.33
Within 7685.3333 20 7685.333
20
= 384.27
𝑀𝑆 𝐵
𝑀𝑆 𝑤
=
249.33
384.27
= 0.6489
Total 𝑆𝑆𝑡 = 8433.33 N-1
Where: 𝑆𝑆 𝐵 = 𝑛 𝑖=1
𝑘
(𝑦1 − 𝑦)2 𝑎𝑛𝑑 𝑆𝑆 𝑤 = 𝑖
𝑘
1 𝑖=1
𝑘

5. State Results (use t table to get the critical value)
𝑡 − 𝑣𝑎𝑙𝑢𝑒 = 3.86
Computed F-value =0.6489
0.6489 > 3.86, Decision:𝐴𝑐𝑐𝑒𝑝𝑡 𝐻 𝑜
6. Conclusion: Therefore, there is no significant
difference between the levels of knowledge possessed
by PUP students regarding Philippine history when
grouped according to their academic subject.

Statistical Techniques in Data Analysis

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to Statistical Techniques in Data Analysis

Similar to Statistical Techniques in Data Analysis (20)

More from mjlobetos

More from mjlobetos (20)

Recently uploaded

Recently uploaded (20)

Statistical Techniques in Data Analysis