USING STATISTICAL
TECHNIQUES IN
ANALYZING DATA
LESSON 27
INTRODUCTION
There are many instances in your life when you try to
determine if some characteristics are related with
each other. On a higher level, you also want to
measure the degree of their relationship or
association. You usually associate height and weight,
budget and expenses and other aspects in life which
may be related with one another.
The Scatter Diagram
Plotting graphically the values of the correlated variables means
placing one variable on the x-axis and the other on the y-axis The
scatter diagram gives you a picture of the relationship between
variables.
Example of a Scatter Diagram
0
5
10
15
20
25
30
35
0 10 20 30 40 50 60
Grades
Example of a Scatter Diagram
0
5
10
15
20
25
30
35
0 10 20 30 40 50 60
Grades
Example of a Scatter Diagram
0
5
10
15
20
25
30
0 10 20 30 40 50 60
Grades
Example of a Scatter Diagram
0
5
10
15
20
25
0 10 20 30 40 50 60
Grades
Types of Correlation
1. Simple Correlation
◦ This is a relationship between two variables. The relationship between an
independent variable and a dependent variable is usually measured.
◦ A. Linear Correlation
◦ This means that a change in one variable is at a constant rate with respect to
the change in the second variable. The correlation between the variables
may either be showing direct or inverse relationship.
Types of Correlation
2. Curvilinear Correlation
◦This means that a change in one variable is not at a fixed rate. It
may be increasing or decreasing with respect to the change in the
other variable.
The Coefficient of Correlation
To obtain the quantitative value of the extent of the
relationship between two sets of items, it is necessary to
calculate the correlation coefficient.
The values of the coefficient correlation ranges between +1 to
-1.
Zero represents no relationship.
The Pearson Product Moment
Correlation Coefficient (Pearson r)
It is derived by Karl Pearson.
It measures the linear relationship between two variables.
Therefore, to be able to determine linearity, it is important that a
scatter diagram be constructed prior to the computation of the
Pearson r.
Pearson r Formula:
𝑟 =
𝑛 𝑥𝑦− 𝑥 𝑦
[𝑛 𝑥2−( 𝑥)
2
][𝑛 𝑦2−( 𝑦)
2
]
Example 1:
The scores of ten randomly selected senior high school
students on the mathematical portion of the National
Admission Test (NAT) and the mathematical ability
Find the coefficient of correlation of the
following
STUDENT
NAME
X Y 𝑿 𝟐
𝒀 𝟐 XY
A 5 6
B 7 15
C 9 16
D 10 12
E 11 21
F 12 22
G 15 8
TOTAL 𝑋=____ 𝑌 =___ 𝑋2=___ 𝑌2=____ 𝑋𝑌 =___
INTERPRETATION OF PEARSON R
0.00 ± 0.20 𝑑𝑒𝑛𝑜𝑡𝑒𝑠 𝑛𝑒𝑔𝑙𝑖𝑔𝑖𝑏𝑙𝑒 𝑐𝑜𝑟𝑟𝑒𝑙𝑎𝑡𝑖𝑜𝑛
0.21 ± 0.40 𝑑𝑒𝑛𝑜𝑡𝑒𝑠 𝑙𝑜𝑤 𝑐𝑜𝑟𝑟𝑒𝑙𝑎𝑡𝑖𝑜𝑛
0.41 ± 0.70 𝑑𝑒𝑛𝑜𝑡𝑒𝑠 ℎ𝑖𝑔ℎ 𝑐𝑜𝑟𝑟𝑒𝑙𝑎𝑡𝑖𝑜𝑛
0.71 ± 1.00 𝑑𝑒𝑛𝑜𝑡𝑒𝑠 𝑣𝑒𝑟𝑦 ℎ𝑖𝑔ℎ 𝑐𝑜𝑟𝑟𝑒𝑙𝑎𝑡𝑖𝑜𝑛
SPEARMAN RANK ORDER COEFFICIENT
OF CORRELATION
The statistics being used on ranks or position is the Spearman Rank Correlation
Coefficient represented here by 𝒓 𝒔. It is a measure of relationship between two
variables by ranking the items or individuals under study according to their
position. It represents the extent to which the same individuals or events occupy
the same relative position on two variables.
Formula: 𝒓 𝒔 = 𝟏 −
𝟔 𝑫 𝟐
𝒏(𝒏 𝟐−𝟏)
where: 𝒓 𝒔 = Spearman rank correlation coefficient
◦ D = difference between the two ranks of an
individual in the variables studied.
◦ n = number of individuals
Find the coefficient correlation of the following data.
Students 𝑺 𝑬 𝑺 𝑷 𝑹 𝑬 𝑹 𝑷 𝑫 𝑫 𝟐
1 48 50
2 35 41
3 48 52
4 36 47
5 53 36
6 48 55
7 32 48
8 30 36
9 56 33
10 42 39
QUIZ 2: Find the coefficient correlation of the following data.
Students 𝑺 𝑬 𝑺 𝑷 𝑹 𝑬 𝑹 𝑷 𝑫 𝑫 𝟐
1 90 89
2 78 78
3 70 65
4 78 92
5 80 94
6 78 95
7 84 90
8 80 78
9 82 78
10 75 90
SIMPLE LINEAR REGRESSION ANALYSIS
Linear regression is the simplest and commonly used statistical
measure for prediction studies. It is concerned with finding an
equation that uses the known values of one or more variables,
called the independent or predictor variables, to estimate the
unknown value of quantitative variable called the dependent or
criterion.
It is a prediction when a variable (Y) is dependent on a second
variable (X) based on the regression equation of a given set of data.
Three major uses of regression analysis
1. Causal analysis –establishes the possible causation of changes in one variable by
changes in other variable.
2.Forecasting an Effect –predicts or estimate the value of a variable given the
values of other variable.
3. Linear Trend Forecasting –imposes a line best fit to time series historical model.
The general form of the linear function is 𝑌 = 𝑎 + 𝑏𝑥
Where: a = is called the Y-intercept of the line
◦ b= is the slope of the line called regression (the rate of change of Y per unit change
in X)
Example
6 randomly selected Grade 11 students took a 50-item mathematics
aptitude test before they began their course in Statistics and Probability
subjects.
1. What linear equation best predicts performance(based on first grading
test scores) in Statistics and Probability based on performance in the
mathematics aptitude?
2. If a student made a score of 45 on the math aptitude test, what score
would we expect the student to obtain in Statistics and Probability.
3. How well does the regression equation fit the data?
Test 1 (X) Test 2 (Y) 𝑿 𝟐 𝒀 𝟐 𝑿𝒀
38 25
35 20
30 17
28 15
25 12
18 15
𝑋 =___ 𝑌 =____ 𝑋2
= 𝑦2
= 𝑥𝑦 =
REGRESSION ANALYSIS
𝐼𝑛 𝑡ℎ𝑒 𝑓𝑜𝑟𝑚𝑢𝑙𝑎: 𝑦′
= 𝑎 + 𝑏𝑥
◦Where: 𝑏 =
𝑥𝑦−
𝑥 𝑦
𝑁
𝑥
2
−
( 𝑥)2
𝑁
◦
◦ 𝑎 = 𝑦′
− 𝑏𝑥 or 𝑎 =
( 𝑦)( 𝑥2)−( 𝑥)( 𝑥𝑦)
𝑁 𝑥
2
−( 𝑥)2
◦
Regression Equation:
xy  
Where:
y = the predicted y value
α = the intercept
β = the slope
Regression Equation:
The above equation can be solved:
 
  
  


 22
xxn
yxxyn

xy  
Where: 𝑦 =mean in y
𝑥 =mean in x𝑦 = 𝛼 + 𝛽 𝑥
Consider the table below. The test scores in statistics and
probability of Grade 11 students in Mainit NHS. Find the
equation of the regression line then predict the grades in
statistics and probability if the test scores are 60 and 75.
RUBRICS 1 RUBRICS 2 Procedures
Activity 1: Seek ye First
25
100
75
50
y
X
302010 40 50 60 8070 90
Grades
Test Scores
Guide Questions:
◦What did you represent to vertical axis?
Horizontal axis?
◦What is your process in plotting points on the
x-y plane
◦Base on the results, describe the diagram
formed by the points plotted.
Procedures 2
Activity 2: Find me out????????
Students x y x2 xy
1 58 87
2 52 86
3 65 89
4 45 86
5 49 86
6 50 85
7 45 83
8 47 76
9 48 79
10 48 81
3364
2704
4214
3870
5785
4472
5046
4225
2025
2401
2500
2025
2209
2304
2304
4250
3735
3572
3792
3888
______
_______


x
x
_______2
x
______
 xy
______
_______


y
y507
50.7
838
83.8
26061
42624
Solving for the Regression Equation:
Students x y x2 xy
7.50
507


x
x
8.83
838


y
y 260612
x
42624
 xy
10n
 
  
  


 22
xxn
yxxyn

    
   2
507061,2610
838507624,4210



561,3
374,1

39.0
xy  
  7.5039.08.83 
03.64
Thus, the regression
equation is:
xy 39.003.64 
Activity 1: Seek ye First
25
100
75
50
y
X
302010 40 50 60 8070 90
Grades
Test Scores
xy 39.003.64 
Guide Questions
Guide Questions
1. What is the value of α as the intercept?____
2. What is the value of β as the slope of the line?______
3. Write the regression equation.___________
4. State the relationship between the grades in statistics y and scores in the test
x._______ Why? Explain mathematically.
◦ The Grades is directly proportional to Scores. It is because the slope β > 0 or
the slope is positive.
64.03
0.39
xy 39.003.64 
5. Give your interpretation about the relationships between
x and y variables base on the results.
◦In every increase of the score by 1, there is a
corresponding increase of grade by 0.39
6. Predict your grades if you got a score of 60, a
score of 75. __________, ____________
87.43 93.28
CHI-SQUARE(𝑥2
)
The Chi square is the most commonly used method of comparing
proportions. It is particularly useful in tests evaluating a relationship
between nominal or ordinal data. Typical situations or settings are
cases where persons, events or objects are grouped in two or more
nominal categories such as “Yes-No” responses, “Favor-Against-
Undecided” or class “A, B, C or D”.
CHI-SQUARE(𝑥2
)
Chi-square analysis compares the observed frequencies of the responses with
the expected frequencies. It is a measure of actual divergence of the observed
and expected frequencies. It is given by the formula:
𝑋 =
Σ(𝐹𝑜 − 𝐹𝑒)2
𝐹𝑒
Where: 𝐹𝑜 = 𝑜𝑏𝑠𝑒𝑟𝑣𝑒𝑑 𝑛𝑢𝑚𝑏𝑒𝑟 𝑜𝑓 𝑐𝑎𝑠𝑒𝑠
𝐹𝑒 = 𝑒𝑥𝑝𝑒𝑐𝑡𝑒𝑑 𝑛𝑢𝑚𝑏𝑒𝑟 𝑜𝑓 𝑐𝑎𝑠𝑒𝑠
𝐹𝑒 =
(𝑟𝑜𝑤 𝑡𝑜𝑡𝑎𝑙)(𝑐𝑜𝑙𝑢𝑚𝑛 𝑡𝑜𝑡𝑎𝑙)
𝑁(𝑔𝑟𝑎𝑛𝑑 𝑡𝑜𝑡𝑎𝑙)
Illustration
Consider the nomination of three (3) presidential candidates of a political party. A,
B and C. The chairman wonders whether or not they will be equally popular among
the members of the party. From this the hypothesis of equal preference, a random
sample of 315 were selected and interviewed which one of the three candidates
they prefer.
The following are the results of the survey:
Candidates Frequency
A 98
B 115
C 102
Calculating the 𝑋2
𝑣𝑎𝑙𝑢𝑒
Candidate 𝑭 𝑶 𝑭 𝒆
A 98 105
B 115 105
C 102 105
𝑋2
=
Σ(𝐹𝑜 − 𝐹𝑒)2
𝐹𝑒
𝑋 =
(98−105)2
105
+
(115−105)2
105
+
(102−105)2
105
=1.505
For chi square significance, use the table
value
Critical value = 5.991
Decision rule: Reject 𝐻 𝑜 𝑖𝑓 𝑋 𝑐𝑜𝑚𝑝𝑢𝑡𝑒𝑑 >
5.991, 𝑜𝑡ℎ𝑒𝑟𝑤𝑖𝑠𝑒 𝑑𝑜 𝑛𝑜𝑡 𝑟𝑒𝑗𝑒𝑐𝑡 𝐻 𝑜
Conclusion: Since 1.505<5.991, do not reject 𝐻 𝑜.
There is no sufficient evidence or reason to reject the null
hypothesis that the frequencies in the population are equal.
Chi-Square as a Test Independence: Two Variables
Chi-Square can also be used to test the significance of relationship
between two variables when data are expressed in terms of
frequencies of joint occurrence.
𝐹𝑒 =
(𝑟𝑜𝑤 𝑡𝑜𝑡𝑎𝑙)(𝑐𝑜𝑙𝑢𝑚𝑛 𝑡𝑜𝑡𝑎𝑙)
𝑁(𝑔𝑟𝑎𝑛𝑑 𝑡𝑜𝑡𝑎𝑙)
Test of Relationship
Chi-Square Test for Independence.
◦This is used when data are expressed in terms of frequencies or
percentage(nominal variable).
◦Formula:
◦𝑥2
=
(𝑂−𝐸)2
𝐸
[df=(r-1)(c-1)
◦Where: 𝐸 =
(𝑟𝑜𝑤 𝑡𝑜𝑡𝑎𝑙)(𝑐𝑜𝑙𝑢𝑚𝑛 𝑡𝑜𝑡𝑎𝑙)
𝑔𝑟𝑎𝑛𝑑 𝑡𝑜𝑡𝑎𝑙
Example
Suppose one want to know if there is a relationship between gender
and school choice. A sample of 100 female and 100 male freshman
students were asked individually for their school choice. Test the null
hypothesis of no significant relationship between the students gender
and school choice at 5% level of significance.
SCHOOL
CHOICE
GENDER
FEMALE MALE TOTAL
PUBLIC 42 𝐶1 65 𝐶3 107
PRIVATE 58 𝐶2 35 𝐶4 93
TOTAL 100 100 200
SCHOOL
CHOICE
GENDER
FEMALE MALE TOTAL
PUBLIC 42 54 65 54 107
PRIVATE 58 47 35 47 93
TOTAL 100 100 200
Calculating the 𝑋2
𝑣𝑎𝑙𝑢𝑒
𝑋2
=
(42−54)2
54
+
(58−47)2
47
+
(65−54)2
54
+
(435−47)2
47
=10.53
Degree of freedom =(row-1)(column-1) =(2-1)(2-1)=1
Critical Value = 3.841
Since the computed value 10.53 is greater than tabular value
3.841.
Decision: There is a significant relationship between the
students gender and school choice.
One Sample z-Test
This test is used when we have a random sample and we
want to test. If it is significantly different from a population
mean or we compared a single sample mean( 𝑋) to a known
or hypothesized population mean(𝜇). This test can be used
only if the background assumptions are satisfied such as
Sample observations
ONE SAMPLE Z-TEST formula
𝑧 =
𝑋−𝜇 𝑜
𝑠
𝑛
where: 𝑠 =
(𝑥− 𝑥)2
𝑛−1
◦ 𝑋 = sample mean
◦𝜇 𝑜= population mean
◦𝑠=population standard deviation
◦N-number of samples
Example:
A company who make cookies, claims that its product
have a mean life span of 7 days with standard
deviation of 2 days. If a random sample of 50 cookies
is tested and one found to have a mean life span for
only 4 days. Test the claim at the 5% level of
significance.
Computational Procedure
1. Define the Null and Alternative Hypothesis.
◦𝐻 𝑜: 𝑢 = 7 𝑎𝑛𝑑 𝐻𝐴 ≠ 7
◦2. State Alpha
◦ 𝛼 = 0.05
◦ 3. State Decision Rule
◦ One-tailed Test: 𝑧 > 𝑧 𝑎; 𝑅𝑒𝑗𝑒𝑐𝑡 𝐻 𝑜
◦ Two-Tailed Test: 𝑧 >
𝑧 𝑎
2
; 𝑅𝑒𝑗𝑒𝑐𝑡 𝐻 𝑜
Computational Procedure
4. Calculate Test Statistic
𝑧 =
𝑋−𝜇 𝑜
𝜎
𝑛
=
4−7
2
50
= -10.6066
5. State Results (use z table to get the critical value)
𝑧 𝑎
2
→
𝑍0.05
2
→ 𝑍0.025 = 1.96
−10.6066 > 1.96, 𝐷𝑒𝑐𝑖𝑠𝑖𝑜𝑛: 𝑅𝑒𝑗𝑒𝑐𝑡 𝐻 𝑜
6. Conclusion: Therefore the company who makes cookies
have mean life span of not equal to 7 days.
Example :1
A researcher wishes to see if the mean number of days
that a basic, low-price, small automobile sits on a dealer’s
lot is 29. A sample of 30 automobile dealers has a mean
of 30.1 days for basic, low-price, small automobiles. At a
0.05, test the claim that the mean time is greater than 29
days. The standard deviation of the population is 3.8
days.
Example :2
The Medical Rehabilitation Education Foundation reports that
the average cost of rehabilitation for stroke victims is $24,672. To
see if the average cost of rehabilitation is different at a particular
hospital, a researcher selects a random sample of 35 stroke
victims at the hospital and finds that the average cost of their
rehabilitation is $26,343. The standard deviation of the
population is $3251. At a 0.01, can it be concluded that the
average cost of stroke rehabilitation at a particular hospital is
different from $24,672?
ONE SAMPLE T-TEST
The One sample t-test is used when we want to know whether the
difference between a sample mean and the population mean is large
enough to be statistically significant, that is if unlikely to have occurred
by chance.
This test can be used only if the background assumptions are satisfied
such as the population mean and standard deviation must be known and
the test statistics should follows a normal distribution.
ONE SAMPLE T-TEST formula
𝑡 =
𝑋−𝜇 𝑜
𝜎
𝑛
where: 𝜎 =
(𝑥− 𝑥)2
𝑁
◦ 𝑋 = sample mean
◦𝜇 𝑜= population mean
◦𝜎=population standard deviation
◦N=population
◦N-sample
Example
A random sample of 10 grade 11 students has grades in English, where
marks range from 1 (worst) to 6 (excellent). The grade point average
(GPA) of all grade 11 students as of the last six years is 4.5. Is the GPA of
the 10 grade 11 students different from the populations GPA? Use 0.05
level of significance.
Student 1 2 3 4 5 6 7 8 9 10
Grade Points 5 6 4.5 5 5 6 5 5 5 5.5
Computational Procedure
1. Define the Null and Alternative Hypothesis.
◦𝐻 𝑜: 𝑢 = 4.5 𝑎𝑛𝑑 𝐻𝐴 ≠ 4.5
◦2. State Alpha
◦3. df = n-1 =10-1=9
◦ 𝛼 = 0.05
4. State Decision Rule
◦ One-tailed Test: 𝑡 > 𝑧 𝑎; 𝑅𝑒𝑗𝑒𝑐𝑡 𝐻 𝑜
◦ Two-Tailed Test: 𝑡 >
𝑧 𝑎
2
; 𝑅𝑒𝑗𝑒𝑐𝑡 𝐻 𝑜
Computational Procedure
4. Calculate Test Statistic
𝑡 =
𝑋−𝜇 𝑜
𝑠
𝑛
=
5.2−4.5
0.4831
10
=4.583
5. State Results (use z table to get the critical value)
𝑡 𝑎
2
𝑛−1
→
𝑡0.05
10−1
→ 𝑡0.0025 = 2.263
4.583 > 2.262, 𝐷𝑒𝑐𝑖𝑠𝑖𝑜𝑛: 𝑅𝑒𝑗𝑒𝑐𝑡 𝐻 𝑜
6. Conclusion: Therefore the grade point average of the 10
pupils is different from the populations GPA
Example 1
The average depth of the Hudson Bay is 305 feet. Climatologists
were interested in seeing if the effects of warming and ice melt
were affecting the water level. Twenty-five measurements over a
period of weeks yielded a sample mean of 306.2 feet. The
population variance is known to be 3.57. Can it be concluded at
the 0.05 level of significance that the average depth has
increased? Is there evidence of what caused this to happen?
Example 2
A physician claims that joggers’ maximal volume oxygen
uptake is greater than the average of all adults. A sample
of 15 joggers has a mean of 40.6 milliliters per kilogram
(ml/kg) and a standard deviation of 6 ml/kg. If the average
of all adults is 36.7 ml/kg, is there enough evidence to
support the physician’s claim at a 0.05?
Example 3
The average local cell phone call length was reported to be 2.27
minutes. A random sample of 20 phone calls showed an average
of 2.98 minutes in length with a standard deviation of 0.98
minute. At a 0.05 can it be concluded that the average differs
from the population average?
Independent Sample z-test: Equal
Variance Not Assumed
It is used for testing two means when the variance is
known and T-test if the variance is unknown.
If Equal Variances Assume: 𝜎1
2
= 𝜎2
2
= σ
𝑧 =
(𝑋1−𝑋2)−(𝜇1− 𝜇2)
𝜎1
2
𝑛1
+
𝜎2
2
𝑛2
Independent Sample z-test: Equal
Variance Not Assumed
It is used for testing two means when the variance is
known and T-test if the variance is unknown.
If Equal Variances Not Assume: 𝜎1
2
≠ 𝜎2
2
𝑧 =
(𝑋1−𝑋2)−(𝜇1− 𝜇2)
𝜎1
2
𝑛1
+
𝜎2
2
𝑛2
The basic format for hypothesis testing
Step 1 State the hypotheses and identify the claim.
Step 2 Find the critical value(s).
Step 3 Compute the test value.
Step 4 Make the decision.
Step 5 Summarize the results.
Example
Employees at public universities work 11.3 hours per week on
the average with a standard deviation of 9.5. At private
universities, the average working time for employees is 9.7
hours, with a standard deviation of 8.9 hours. The sample size for
each is 500. Is there a significant difference between the average
hours of the public and private universities? Perform a
hypothesis testing using 5% level of significance to find out.
Computational Procedure
1. Define the Null and Alternative Hypothesis.
◦𝐻 𝑜: 𝑃𝑢𝑏𝑙𝑖𝑐 = 𝑃𝑟𝑖𝑣𝑎𝑡𝑒 𝑎𝑛𝑑 𝐻 𝑎: 𝑃𝑢𝑏𝑙𝑖𝑐 ≠ 𝑃𝑟𝑖𝑣𝑎𝑡𝑒
◦2. State Alpha : 𝛼 = 0.05
◦3. State Decision Rule
◦ One-tailed Test: 𝑧 > 𝑧 𝑎; 𝑅𝑒𝑗𝑒𝑐𝑡 𝐻 𝑜
◦ Two-Tailed Test: 𝑧 >
𝑧 𝑎
2
; 𝑅𝑒𝑗𝑒𝑐𝑡 𝐻 𝑜
Computational Procedure
4. Calculate Test Statistic
𝑧 =
(𝑋1−𝑋2)−(𝜇1− 𝜇2)
𝜎1
2
𝑛1
+
𝜎2
2
𝑛2
=
11.3−9.7 −0
9.5
500
+
8.9
500
=1.9444
5. State Results (use z table to get the critical value)
𝑡 𝑎
2
𝑛−1
→
𝑡0.05
10−1
→ 𝑡0.0025 = 1.96
1.9444 < 1.96, 𝐷𝑒𝑐𝑖𝑠𝑖𝑜𝑛 ∶ 𝐴𝑐𝑐𝑒𝑝𝑡 𝐻 𝑜
6. Conclusion : Therefore, there is no significant difference between the
average hours of the public and private universities.
Example 2
A survey found that the average hotel room rate in New Orleans is
$88.42 and the average room rate in Phoenix is $80.61. Assume that the
data were obtained from two samples of 50 hotels each and that the
standard deviations of the populations are $5.62 and $4.83 respectively.
At a 0.05, can it be concluded that there is a significant
difference in the rates?
The basic format for hypothesis testing
Step 1 State the hypotheses and identify the claim.
Step 2 Find the critical value(s).
Step 3 Compute the test value.
Step 4 Make the decision.
Step 5 Summarize the results.
Independent Sample T-Test: Equal
Variance Assumed
The independent- measures hypothesis test allows researchers to
evaluate or to compare the mean difference between two populations
using the data from two separate samples. Generally, 𝜎2
is unknown
and is being estimated from the data. Hence, the t-test is used.
If Equal Variances Assume: 𝑠1
2
= 𝑠2
2
𝑡 =
(𝑋1−𝑋2)−(𝜇1− 𝜇2)
𝜎1
2
𝑛1
+
𝜎2
2
𝑛2
Independent Sample T-Test: Equal
Variance Assumed
It is used for testing two means when the variance is
known and T-test if the variance is unknown.
If Equal Variances Not Assume: 𝜎1
2
≠ 𝜎2
2
𝑡 =
(𝑋1−𝑋2)−(𝜇1− 𝜇2)
𝜎1
2
𝑛1
+
𝜎2
2
𝑛2
Remember degree of freedom: df=𝑛1 + 𝑛2 − 2
Example
Suppose we put people on 2 diets “the fruit diet and the bread
diet”. Participants are randomly assigned to either 7 days of
eating exclusively fruits or 7 days of exclusively eating bread. At
the end of the week, we measure the weight gain of each
participant. Does bread diet cause more gain weight compared
to fruit diet? Test the claim using 10% level of significance.
𝑿 𝟏∶: 𝑭𝒓𝒖𝒊𝒕 𝑫𝒊𝒆𝒕 3 4 4 4 5 6
𝑿 𝟐:: 𝑩𝒓𝒆𝒂𝒅 𝑫𝒊𝒆𝒕 1 2 2 2 3 4
ONE WAY ANALYSIS OF VARIANCE
One way analysis of variance is used when you want to compare the
means of more than two groups. This test can be used only if the
background assumptions are satisfied such that it has independent
random samples, population are normal and population variance are
equal.
F=
𝑀𝑆 𝐵
𝑀𝑆 𝑤
𝑆𝑆 𝐵 = 𝑛
𝑖=1
𝑘
(𝑦1 − 𝑦)2
𝑎𝑛𝑑 𝑆𝑆 𝑤 =
𝑖
𝑘
𝑖
𝑖=1
𝑘
(𝑦𝑖𝑗 − 𝑦𝑖)2
Where: 𝑀𝑆
𝐵=
𝑆𝑆 𝐵
𝑘−1
𝑀𝑆 𝑤 =
𝑆𝑆 𝑤
𝑁 − 𝑘
Summary Table for one way-Anova
Source Sum of Squares Degrees of
Freedom
Variance
Estimate
F ratio
Between 𝑆𝑆 𝐵 K-1 𝑀𝑆
𝐵=
𝑆𝑆 𝐵
𝑘−1
Within 𝑆𝑆 𝑤 N-K 𝑀𝑆 𝑤
=
𝑆𝑆 𝑤
𝑁 − 𝑘
𝑀𝑆 𝐵
𝑀𝑆 𝑤
Total 𝑆𝑆𝑡 = 𝑆𝑆 𝐵 + 𝑆𝑆 𝑤 N-1
Where: 𝑆𝑆 𝐵 = 𝑛 𝑖=1
𝑘
(𝑦1 − 𝑦)2
𝑎𝑛𝑑 𝑆𝑆 𝑤 = 𝑖
𝑘
1 𝑖=1
𝑘
(𝑦𝑖𝑗 − 𝑦𝑖)2
Examples
A teacher is concerned about the level of knowledge possessed by PUP
students regarding Philippine history. Students completed a senior high
school level standardized history item. Academic major of the students
was also recorded. Data in terms of percent correct response is recorded
below for 24 hours. Is there a significant difference between the levels of
knowledge possessed by PUP students regarding Philippines history
grouped when grouped according to their academic major? Compute the
appropriate test for the data provided below and used 0.05 level of
significance.
EDUCATION BUSINESS
MANAGEMENT
BEHAVIORAL SOCIAL
SCIENCE
ENGINEERING
63 72 42 81
79 49 52 57
78 64 30 87
56 68 83 64
67 39 22 29
47 78 71 30
Computational Procedure
1. Define the Null and Alternative Hypothesis.
◦ 𝐻 𝑜: 𝐸𝑑𝑢𝑐𝑎𝑡𝑖𝑜𝑛 = 𝐵𝑢𝑠𝑖𝑛𝑒𝑠𝑠 = 𝐵𝑒ℎ𝑎𝑣𝑖𝑜𝑟𝑎𝑙 = 𝐸𝑛𝑔𝑖𝑛𝑒𝑒𝑟𝑖𝑛𝑔
◦ 𝐻 𝑎: 𝐴𝑡 𝑙𝑒𝑎𝑠𝑡 𝑡𝑤𝑜 𝑜𝑓 𝑡ℎ𝑒 𝑚𝑒𝑎𝑛𝑠 𝑜𝑓 𝐸𝑑𝑢𝑐𝑎𝑡𝑖𝑜𝑛, 𝐵𝑢𝑠𝑖𝑛𝑒𝑠𝑠, 𝐵𝑒ℎ𝑎𝑣𝑖𝑜𝑟𝑎𝑙
◦ 𝑎𝑛𝑑 𝐸𝑛𝑔𝑖𝑛𝑒𝑒𝑟𝑖𝑛𝑔 𝑎𝑟𝑒 𝑛𝑜𝑡 𝑒𝑞𝑢𝑎𝑙
◦ 2. State Alpha : 𝛼 = 0.05
◦ 3. Degrees of freedom = 𝑑 𝑓1 = 𝑘 − 1 = 4 − 1 = 3 (within groups)
◦ 𝑑 𝑓2 = 𝑛 − 𝑘 = 24 − 4 = 20 (between groups)
◦ 3. State Decision Rule
◦ One-tailed Test: 𝑓 > 𝑓𝑎; 𝑅𝑒𝑗𝑒𝑐𝑡 𝐻 𝑜
◦ Two-Tailed Test: 𝑓 >
𝑓𝑎
2
; 𝑅𝑒𝑗𝑒𝑐𝑡 𝐻 𝑜
EDUCATION
BUSINESS
MANAGEMENT
BEHAVIORAL
SOCIAL SCIENCE
ENGINEERING Total
63 72 42 81
79 49 52 57
78 64 30 87
56 68 83 64
67 39 22 29
47 78 71 30
Mean 65.00 61.67 50.00 58.00 58.67
Computing the Sum of Squares
1. 𝑆𝑆 𝐵 = 𝑛 𝑖=1
𝑘
(( 𝑦 − 𝑦)
2
= 6 65 − 58.67 + 6(61.67 − 58.67)+
6 50 − 58.67 + 6 58 − 58.67 = 748
2. 𝑆𝑆 𝑤 = 𝑖
𝑘
1 𝑖=1
𝑘
(𝑦𝑖𝑗 − 𝑦𝑖)2 = 778 + 1093.33 + 2782 + 3032 = 7685.333
𝑆𝑆 𝑒𝑑𝑢𝑐𝑎𝑡𝑖𝑜𝑛 = (63 − 65)2+ (79 − 65)2+ (78 − 65)2+ (56 − 65)2+
(67 − 65)2
+(47 − 65)2
=778
𝑆𝑆 𝑏𝑢𝑠𝑖𝑒𝑛𝑒𝑠𝑠 = (72 − 61.67)2+ (49 − 61.67)2+ (64 − 61.67)2+ (68 − 61.67)2+
(39 − 61.67)2
+(78 − 61.67)2
=1093.33
𝑆𝑆𝑠𝑜𝑐𝑖𝑎𝑙 𝑠𝑐𝑖𝑒𝑛𝑐𝑒 = (42 − 50)2+ (52 − 50)2+ (30 − 50)2+ (83 − 50)2+
(22 − 50)2+(71 − 50)2=2782
𝑆𝑆 𝑒𝑛𝑔𝑖𝑛𝑒𝑒𝑟𝑖𝑛𝑔 = (81 − 58)2+ (57 − 58)2+ (87 − 58)2+ (64 − 58)2+
(29 − 58)2+(30 − 58)2=3032
Summary Table for one way-Anova
Source Sum of Squares Degrees of
Freedom
Variance
Estimate
F ratio
Between 748 3 748
3
= 249.33
Within 7685.3333 20 7685.333
20
= 384.27
𝑀𝑆 𝐵
𝑀𝑆 𝑤
=
249.33
384.27
= 0.6489
Total 𝑆𝑆𝑡 = 8433.33 N-1
Where: 𝑆𝑆 𝐵 = 𝑛 𝑖=1
𝑘
(𝑦1 − 𝑦)2 𝑎𝑛𝑑 𝑆𝑆 𝑤 = 𝑖
𝑘
1 𝑖=1
𝑘
(𝑦𝑖𝑗 − 𝑦𝑖)2
Computational Procedure
5. State Results (use t table to get the critical value)
𝑡 − 𝑣𝑎𝑙𝑢𝑒 = 3.86
Computed F-value =0.6489
0.6489 > 3.86, Decision:𝐴𝑐𝑐𝑒𝑝𝑡 𝐻 𝑜
6. Conclusion: Therefore, there is no significant
difference between the levels of knowledge possessed
by PUP students regarding Philippine history when
grouped according to their academic subject.

Lesson 27 using statistical techniques in analyzing data

  • 1.
  • 2.
    INTRODUCTION There are manyinstances in your life when you try to determine if some characteristics are related with each other. On a higher level, you also want to measure the degree of their relationship or association. You usually associate height and weight, budget and expenses and other aspects in life which may be related with one another.
  • 3.
    The Scatter Diagram Plottinggraphically the values of the correlated variables means placing one variable on the x-axis and the other on the y-axis The scatter diagram gives you a picture of the relationship between variables.
  • 4.
    Example of aScatter Diagram 0 5 10 15 20 25 30 35 0 10 20 30 40 50 60 Grades
  • 5.
    Example of aScatter Diagram 0 5 10 15 20 25 30 35 0 10 20 30 40 50 60 Grades
  • 6.
    Example of aScatter Diagram 0 5 10 15 20 25 30 0 10 20 30 40 50 60 Grades
  • 7.
    Example of aScatter Diagram 0 5 10 15 20 25 0 10 20 30 40 50 60 Grades
  • 8.
    Types of Correlation 1.Simple Correlation ◦ This is a relationship between two variables. The relationship between an independent variable and a dependent variable is usually measured. ◦ A. Linear Correlation ◦ This means that a change in one variable is at a constant rate with respect to the change in the second variable. The correlation between the variables may either be showing direct or inverse relationship.
  • 9.
    Types of Correlation 2.Curvilinear Correlation ◦This means that a change in one variable is not at a fixed rate. It may be increasing or decreasing with respect to the change in the other variable.
  • 10.
    The Coefficient ofCorrelation To obtain the quantitative value of the extent of the relationship between two sets of items, it is necessary to calculate the correlation coefficient. The values of the coefficient correlation ranges between +1 to -1. Zero represents no relationship.
  • 11.
    The Pearson ProductMoment Correlation Coefficient (Pearson r) It is derived by Karl Pearson. It measures the linear relationship between two variables. Therefore, to be able to determine linearity, it is important that a scatter diagram be constructed prior to the computation of the Pearson r.
  • 12.
    Pearson r Formula: 𝑟= 𝑛 𝑥𝑦− 𝑥 𝑦 [𝑛 𝑥2−( 𝑥) 2 ][𝑛 𝑦2−( 𝑦) 2 ]
  • 13.
    Example 1: The scoresof ten randomly selected senior high school students on the mathematical portion of the National Admission Test (NAT) and the mathematical ability
  • 14.
    Find the coefficientof correlation of the following STUDENT NAME X Y 𝑿 𝟐 𝒀 𝟐 XY A 5 6 B 7 15 C 9 16 D 10 12 E 11 21 F 12 22 G 15 8 TOTAL 𝑋=____ 𝑌 =___ 𝑋2=___ 𝑌2=____ 𝑋𝑌 =___
  • 15.
    INTERPRETATION OF PEARSONR 0.00 ± 0.20 𝑑𝑒𝑛𝑜𝑡𝑒𝑠 𝑛𝑒𝑔𝑙𝑖𝑔𝑖𝑏𝑙𝑒 𝑐𝑜𝑟𝑟𝑒𝑙𝑎𝑡𝑖𝑜𝑛 0.21 ± 0.40 𝑑𝑒𝑛𝑜𝑡𝑒𝑠 𝑙𝑜𝑤 𝑐𝑜𝑟𝑟𝑒𝑙𝑎𝑡𝑖𝑜𝑛 0.41 ± 0.70 𝑑𝑒𝑛𝑜𝑡𝑒𝑠 ℎ𝑖𝑔ℎ 𝑐𝑜𝑟𝑟𝑒𝑙𝑎𝑡𝑖𝑜𝑛 0.71 ± 1.00 𝑑𝑒𝑛𝑜𝑡𝑒𝑠 𝑣𝑒𝑟𝑦 ℎ𝑖𝑔ℎ 𝑐𝑜𝑟𝑟𝑒𝑙𝑎𝑡𝑖𝑜𝑛
  • 16.
    SPEARMAN RANK ORDERCOEFFICIENT OF CORRELATION The statistics being used on ranks or position is the Spearman Rank Correlation Coefficient represented here by 𝒓 𝒔. It is a measure of relationship between two variables by ranking the items or individuals under study according to their position. It represents the extent to which the same individuals or events occupy the same relative position on two variables. Formula: 𝒓 𝒔 = 𝟏 − 𝟔 𝑫 𝟐 𝒏(𝒏 𝟐−𝟏) where: 𝒓 𝒔 = Spearman rank correlation coefficient ◦ D = difference between the two ranks of an individual in the variables studied. ◦ n = number of individuals
  • 17.
    Find the coefficientcorrelation of the following data. Students 𝑺 𝑬 𝑺 𝑷 𝑹 𝑬 𝑹 𝑷 𝑫 𝑫 𝟐 1 48 50 2 35 41 3 48 52 4 36 47 5 53 36 6 48 55 7 32 48 8 30 36 9 56 33 10 42 39
  • 18.
    QUIZ 2: Findthe coefficient correlation of the following data. Students 𝑺 𝑬 𝑺 𝑷 𝑹 𝑬 𝑹 𝑷 𝑫 𝑫 𝟐 1 90 89 2 78 78 3 70 65 4 78 92 5 80 94 6 78 95 7 84 90 8 80 78 9 82 78 10 75 90
  • 19.
    SIMPLE LINEAR REGRESSIONANALYSIS Linear regression is the simplest and commonly used statistical measure for prediction studies. It is concerned with finding an equation that uses the known values of one or more variables, called the independent or predictor variables, to estimate the unknown value of quantitative variable called the dependent or criterion. It is a prediction when a variable (Y) is dependent on a second variable (X) based on the regression equation of a given set of data.
  • 20.
    Three major usesof regression analysis 1. Causal analysis –establishes the possible causation of changes in one variable by changes in other variable. 2.Forecasting an Effect –predicts or estimate the value of a variable given the values of other variable. 3. Linear Trend Forecasting –imposes a line best fit to time series historical model. The general form of the linear function is 𝑌 = 𝑎 + 𝑏𝑥 Where: a = is called the Y-intercept of the line ◦ b= is the slope of the line called regression (the rate of change of Y per unit change in X)
  • 21.
    Example 6 randomly selectedGrade 11 students took a 50-item mathematics aptitude test before they began their course in Statistics and Probability subjects. 1. What linear equation best predicts performance(based on first grading test scores) in Statistics and Probability based on performance in the mathematics aptitude? 2. If a student made a score of 45 on the math aptitude test, what score would we expect the student to obtain in Statistics and Probability. 3. How well does the regression equation fit the data?
  • 22.
    Test 1 (X)Test 2 (Y) 𝑿 𝟐 𝒀 𝟐 𝑿𝒀 38 25 35 20 30 17 28 15 25 12 18 15 𝑋 =___ 𝑌 =____ 𝑋2 = 𝑦2 = 𝑥𝑦 =
  • 23.
    REGRESSION ANALYSIS 𝐼𝑛 𝑡ℎ𝑒𝑓𝑜𝑟𝑚𝑢𝑙𝑎: 𝑦′ = 𝑎 + 𝑏𝑥 ◦Where: 𝑏 = 𝑥𝑦− 𝑥 𝑦 𝑁 𝑥 2 − ( 𝑥)2 𝑁 ◦ ◦ 𝑎 = 𝑦′ − 𝑏𝑥 or 𝑎 = ( 𝑦)( 𝑥2)−( 𝑥)( 𝑥𝑦) 𝑁 𝑥 2 −( 𝑥)2 ◦
  • 24.
    Regression Equation: xy  Where: y = the predicted y value α = the intercept β = the slope
  • 25.
    Regression Equation: The aboveequation can be solved:            22 xxn yxxyn  xy   Where: 𝑦 =mean in y 𝑥 =mean in x𝑦 = 𝛼 + 𝛽 𝑥
  • 26.
    Consider the tablebelow. The test scores in statistics and probability of Grade 11 students in Mainit NHS. Find the equation of the regression line then predict the grades in statistics and probability if the test scores are 60 and 75. RUBRICS 1 RUBRICS 2 Procedures
  • 27.
    Activity 1: Seekye First 25 100 75 50 y X 302010 40 50 60 8070 90 Grades Test Scores
  • 28.
    Guide Questions: ◦What didyou represent to vertical axis? Horizontal axis? ◦What is your process in plotting points on the x-y plane ◦Base on the results, describe the diagram formed by the points plotted. Procedures 2
  • 29.
    Activity 2: Findme out???????? Students x y x2 xy 1 58 87 2 52 86 3 65 89 4 45 86 5 49 86 6 50 85 7 45 83 8 47 76 9 48 79 10 48 81 3364 2704 4214 3870 5785 4472 5046 4225 2025 2401 2500 2025 2209 2304 2304 4250 3735 3572 3792 3888 ______ _______   x x _______2 x ______  xy ______ _______   y y507 50.7 838 83.8 26061 42624
  • 30.
    Solving for theRegression Equation: Students x y x2 xy 7.50 507   x x 8.83 838   y y 260612 x 42624  xy 10n            22 xxn yxxyn          2 507061,2610 838507624,4210    561,3 374,1  39.0 xy     7.5039.08.83  03.64 Thus, the regression equation is: xy 39.003.64 
  • 31.
    Activity 1: Seekye First 25 100 75 50 y X 302010 40 50 60 8070 90 Grades Test Scores xy 39.003.64  Guide Questions
  • 32.
    Guide Questions 1. Whatis the value of α as the intercept?____ 2. What is the value of β as the slope of the line?______ 3. Write the regression equation.___________ 4. State the relationship between the grades in statistics y and scores in the test x._______ Why? Explain mathematically. ◦ The Grades is directly proportional to Scores. It is because the slope β > 0 or the slope is positive. 64.03 0.39 xy 39.003.64 
  • 33.
    5. Give yourinterpretation about the relationships between x and y variables base on the results. ◦In every increase of the score by 1, there is a corresponding increase of grade by 0.39 6. Predict your grades if you got a score of 60, a score of 75. __________, ____________ 87.43 93.28
  • 35.
    CHI-SQUARE(𝑥2 ) The Chi squareis the most commonly used method of comparing proportions. It is particularly useful in tests evaluating a relationship between nominal or ordinal data. Typical situations or settings are cases where persons, events or objects are grouped in two or more nominal categories such as “Yes-No” responses, “Favor-Against- Undecided” or class “A, B, C or D”.
  • 36.
    CHI-SQUARE(𝑥2 ) Chi-square analysis comparesthe observed frequencies of the responses with the expected frequencies. It is a measure of actual divergence of the observed and expected frequencies. It is given by the formula: 𝑋 = Σ(𝐹𝑜 − 𝐹𝑒)2 𝐹𝑒 Where: 𝐹𝑜 = 𝑜𝑏𝑠𝑒𝑟𝑣𝑒𝑑 𝑛𝑢𝑚𝑏𝑒𝑟 𝑜𝑓 𝑐𝑎𝑠𝑒𝑠 𝐹𝑒 = 𝑒𝑥𝑝𝑒𝑐𝑡𝑒𝑑 𝑛𝑢𝑚𝑏𝑒𝑟 𝑜𝑓 𝑐𝑎𝑠𝑒𝑠 𝐹𝑒 = (𝑟𝑜𝑤 𝑡𝑜𝑡𝑎𝑙)(𝑐𝑜𝑙𝑢𝑚𝑛 𝑡𝑜𝑡𝑎𝑙) 𝑁(𝑔𝑟𝑎𝑛𝑑 𝑡𝑜𝑡𝑎𝑙)
  • 37.
    Illustration Consider the nominationof three (3) presidential candidates of a political party. A, B and C. The chairman wonders whether or not they will be equally popular among the members of the party. From this the hypothesis of equal preference, a random sample of 315 were selected and interviewed which one of the three candidates they prefer. The following are the results of the survey: Candidates Frequency A 98 B 115 C 102
  • 38.
    Calculating the 𝑋2 𝑣𝑎𝑙𝑢𝑒 Candidate𝑭 𝑶 𝑭 𝒆 A 98 105 B 115 105 C 102 105 𝑋2 = Σ(𝐹𝑜 − 𝐹𝑒)2 𝐹𝑒 𝑋 = (98−105)2 105 + (115−105)2 105 + (102−105)2 105 =1.505
  • 39.
    For chi squaresignificance, use the table value Critical value = 5.991 Decision rule: Reject 𝐻 𝑜 𝑖𝑓 𝑋 𝑐𝑜𝑚𝑝𝑢𝑡𝑒𝑑 > 5.991, 𝑜𝑡ℎ𝑒𝑟𝑤𝑖𝑠𝑒 𝑑𝑜 𝑛𝑜𝑡 𝑟𝑒𝑗𝑒𝑐𝑡 𝐻 𝑜 Conclusion: Since 1.505<5.991, do not reject 𝐻 𝑜. There is no sufficient evidence or reason to reject the null hypothesis that the frequencies in the population are equal.
  • 40.
    Chi-Square as aTest Independence: Two Variables Chi-Square can also be used to test the significance of relationship between two variables when data are expressed in terms of frequencies of joint occurrence. 𝐹𝑒 = (𝑟𝑜𝑤 𝑡𝑜𝑡𝑎𝑙)(𝑐𝑜𝑙𝑢𝑚𝑛 𝑡𝑜𝑡𝑎𝑙) 𝑁(𝑔𝑟𝑎𝑛𝑑 𝑡𝑜𝑡𝑎𝑙)
  • 41.
    Test of Relationship Chi-SquareTest for Independence. ◦This is used when data are expressed in terms of frequencies or percentage(nominal variable). ◦Formula: ◦𝑥2 = (𝑂−𝐸)2 𝐸 [df=(r-1)(c-1) ◦Where: 𝐸 = (𝑟𝑜𝑤 𝑡𝑜𝑡𝑎𝑙)(𝑐𝑜𝑙𝑢𝑚𝑛 𝑡𝑜𝑡𝑎𝑙) 𝑔𝑟𝑎𝑛𝑑 𝑡𝑜𝑡𝑎𝑙
  • 42.
    Example Suppose one wantto know if there is a relationship between gender and school choice. A sample of 100 female and 100 male freshman students were asked individually for their school choice. Test the null hypothesis of no significant relationship between the students gender and school choice at 5% level of significance.
  • 43.
    SCHOOL CHOICE GENDER FEMALE MALE TOTAL PUBLIC42 𝐶1 65 𝐶3 107 PRIVATE 58 𝐶2 35 𝐶4 93 TOTAL 100 100 200
  • 44.
    SCHOOL CHOICE GENDER FEMALE MALE TOTAL PUBLIC42 54 65 54 107 PRIVATE 58 47 35 47 93 TOTAL 100 100 200
  • 45.
    Calculating the 𝑋2 𝑣𝑎𝑙𝑢𝑒 𝑋2 = (42−54)2 54 + (58−47)2 47 + (65−54)2 54 + (435−47)2 47 =10.53 Degreeof freedom =(row-1)(column-1) =(2-1)(2-1)=1 Critical Value = 3.841 Since the computed value 10.53 is greater than tabular value 3.841. Decision: There is a significant relationship between the students gender and school choice.
  • 46.
    One Sample z-Test Thistest is used when we have a random sample and we want to test. If it is significantly different from a population mean or we compared a single sample mean( 𝑋) to a known or hypothesized population mean(𝜇). This test can be used only if the background assumptions are satisfied such as Sample observations
  • 48.
    ONE SAMPLE Z-TESTformula 𝑧 = 𝑋−𝜇 𝑜 𝑠 𝑛 where: 𝑠 = (𝑥− 𝑥)2 𝑛−1 ◦ 𝑋 = sample mean ◦𝜇 𝑜= population mean ◦𝑠=population standard deviation ◦N-number of samples
  • 49.
    Example: A company whomake cookies, claims that its product have a mean life span of 7 days with standard deviation of 2 days. If a random sample of 50 cookies is tested and one found to have a mean life span for only 4 days. Test the claim at the 5% level of significance.
  • 50.
    Computational Procedure 1. Definethe Null and Alternative Hypothesis. ◦𝐻 𝑜: 𝑢 = 7 𝑎𝑛𝑑 𝐻𝐴 ≠ 7 ◦2. State Alpha ◦ 𝛼 = 0.05 ◦ 3. State Decision Rule ◦ One-tailed Test: 𝑧 > 𝑧 𝑎; 𝑅𝑒𝑗𝑒𝑐𝑡 𝐻 𝑜 ◦ Two-Tailed Test: 𝑧 > 𝑧 𝑎 2 ; 𝑅𝑒𝑗𝑒𝑐𝑡 𝐻 𝑜
  • 51.
    Computational Procedure 4. CalculateTest Statistic 𝑧 = 𝑋−𝜇 𝑜 𝜎 𝑛 = 4−7 2 50 = -10.6066 5. State Results (use z table to get the critical value) 𝑧 𝑎 2 → 𝑍0.05 2 → 𝑍0.025 = 1.96 −10.6066 > 1.96, 𝐷𝑒𝑐𝑖𝑠𝑖𝑜𝑛: 𝑅𝑒𝑗𝑒𝑐𝑡 𝐻 𝑜 6. Conclusion: Therefore the company who makes cookies have mean life span of not equal to 7 days.
  • 52.
    Example :1 A researcherwishes to see if the mean number of days that a basic, low-price, small automobile sits on a dealer’s lot is 29. A sample of 30 automobile dealers has a mean of 30.1 days for basic, low-price, small automobiles. At a 0.05, test the claim that the mean time is greater than 29 days. The standard deviation of the population is 3.8 days.
  • 53.
    Example :2 The MedicalRehabilitation Education Foundation reports that the average cost of rehabilitation for stroke victims is $24,672. To see if the average cost of rehabilitation is different at a particular hospital, a researcher selects a random sample of 35 stroke victims at the hospital and finds that the average cost of their rehabilitation is $26,343. The standard deviation of the population is $3251. At a 0.01, can it be concluded that the average cost of stroke rehabilitation at a particular hospital is different from $24,672?
  • 54.
    ONE SAMPLE T-TEST TheOne sample t-test is used when we want to know whether the difference between a sample mean and the population mean is large enough to be statistically significant, that is if unlikely to have occurred by chance. This test can be used only if the background assumptions are satisfied such as the population mean and standard deviation must be known and the test statistics should follows a normal distribution.
  • 55.
    ONE SAMPLE T-TESTformula 𝑡 = 𝑋−𝜇 𝑜 𝜎 𝑛 where: 𝜎 = (𝑥− 𝑥)2 𝑁 ◦ 𝑋 = sample mean ◦𝜇 𝑜= population mean ◦𝜎=population standard deviation ◦N=population ◦N-sample
  • 56.
    Example A random sampleof 10 grade 11 students has grades in English, where marks range from 1 (worst) to 6 (excellent). The grade point average (GPA) of all grade 11 students as of the last six years is 4.5. Is the GPA of the 10 grade 11 students different from the populations GPA? Use 0.05 level of significance. Student 1 2 3 4 5 6 7 8 9 10 Grade Points 5 6 4.5 5 5 6 5 5 5 5.5
  • 57.
    Computational Procedure 1. Definethe Null and Alternative Hypothesis. ◦𝐻 𝑜: 𝑢 = 4.5 𝑎𝑛𝑑 𝐻𝐴 ≠ 4.5 ◦2. State Alpha ◦3. df = n-1 =10-1=9 ◦ 𝛼 = 0.05 4. State Decision Rule ◦ One-tailed Test: 𝑡 > 𝑧 𝑎; 𝑅𝑒𝑗𝑒𝑐𝑡 𝐻 𝑜 ◦ Two-Tailed Test: 𝑡 > 𝑧 𝑎 2 ; 𝑅𝑒𝑗𝑒𝑐𝑡 𝐻 𝑜
  • 58.
    Computational Procedure 4. CalculateTest Statistic 𝑡 = 𝑋−𝜇 𝑜 𝑠 𝑛 = 5.2−4.5 0.4831 10 =4.583 5. State Results (use z table to get the critical value) 𝑡 𝑎 2 𝑛−1 → 𝑡0.05 10−1 → 𝑡0.0025 = 2.263 4.583 > 2.262, 𝐷𝑒𝑐𝑖𝑠𝑖𝑜𝑛: 𝑅𝑒𝑗𝑒𝑐𝑡 𝐻 𝑜 6. Conclusion: Therefore the grade point average of the 10 pupils is different from the populations GPA
  • 59.
    Example 1 The averagedepth of the Hudson Bay is 305 feet. Climatologists were interested in seeing if the effects of warming and ice melt were affecting the water level. Twenty-five measurements over a period of weeks yielded a sample mean of 306.2 feet. The population variance is known to be 3.57. Can it be concluded at the 0.05 level of significance that the average depth has increased? Is there evidence of what caused this to happen?
  • 60.
    Example 2 A physicianclaims that joggers’ maximal volume oxygen uptake is greater than the average of all adults. A sample of 15 joggers has a mean of 40.6 milliliters per kilogram (ml/kg) and a standard deviation of 6 ml/kg. If the average of all adults is 36.7 ml/kg, is there enough evidence to support the physician’s claim at a 0.05?
  • 61.
    Example 3 The averagelocal cell phone call length was reported to be 2.27 minutes. A random sample of 20 phone calls showed an average of 2.98 minutes in length with a standard deviation of 0.98 minute. At a 0.05 can it be concluded that the average differs from the population average?
  • 62.
    Independent Sample z-test:Equal Variance Not Assumed It is used for testing two means when the variance is known and T-test if the variance is unknown. If Equal Variances Assume: 𝜎1 2 = 𝜎2 2 = σ 𝑧 = (𝑋1−𝑋2)−(𝜇1− 𝜇2) 𝜎1 2 𝑛1 + 𝜎2 2 𝑛2
  • 63.
    Independent Sample z-test:Equal Variance Not Assumed It is used for testing two means when the variance is known and T-test if the variance is unknown. If Equal Variances Not Assume: 𝜎1 2 ≠ 𝜎2 2 𝑧 = (𝑋1−𝑋2)−(𝜇1− 𝜇2) 𝜎1 2 𝑛1 + 𝜎2 2 𝑛2
  • 64.
    The basic formatfor hypothesis testing Step 1 State the hypotheses and identify the claim. Step 2 Find the critical value(s). Step 3 Compute the test value. Step 4 Make the decision. Step 5 Summarize the results.
  • 65.
    Example Employees at publicuniversities work 11.3 hours per week on the average with a standard deviation of 9.5. At private universities, the average working time for employees is 9.7 hours, with a standard deviation of 8.9 hours. The sample size for each is 500. Is there a significant difference between the average hours of the public and private universities? Perform a hypothesis testing using 5% level of significance to find out.
  • 66.
    Computational Procedure 1. Definethe Null and Alternative Hypothesis. ◦𝐻 𝑜: 𝑃𝑢𝑏𝑙𝑖𝑐 = 𝑃𝑟𝑖𝑣𝑎𝑡𝑒 𝑎𝑛𝑑 𝐻 𝑎: 𝑃𝑢𝑏𝑙𝑖𝑐 ≠ 𝑃𝑟𝑖𝑣𝑎𝑡𝑒 ◦2. State Alpha : 𝛼 = 0.05 ◦3. State Decision Rule ◦ One-tailed Test: 𝑧 > 𝑧 𝑎; 𝑅𝑒𝑗𝑒𝑐𝑡 𝐻 𝑜 ◦ Two-Tailed Test: 𝑧 > 𝑧 𝑎 2 ; 𝑅𝑒𝑗𝑒𝑐𝑡 𝐻 𝑜
  • 67.
    Computational Procedure 4. CalculateTest Statistic 𝑧 = (𝑋1−𝑋2)−(𝜇1− 𝜇2) 𝜎1 2 𝑛1 + 𝜎2 2 𝑛2 = 11.3−9.7 −0 9.5 500 + 8.9 500 =1.9444 5. State Results (use z table to get the critical value) 𝑡 𝑎 2 𝑛−1 → 𝑡0.05 10−1 → 𝑡0.0025 = 1.96 1.9444 < 1.96, 𝐷𝑒𝑐𝑖𝑠𝑖𝑜𝑛 ∶ 𝐴𝑐𝑐𝑒𝑝𝑡 𝐻 𝑜 6. Conclusion : Therefore, there is no significant difference between the average hours of the public and private universities.
  • 68.
    Example 2 A surveyfound that the average hotel room rate in New Orleans is $88.42 and the average room rate in Phoenix is $80.61. Assume that the data were obtained from two samples of 50 hotels each and that the standard deviations of the populations are $5.62 and $4.83 respectively. At a 0.05, can it be concluded that there is a significant difference in the rates?
  • 69.
    The basic formatfor hypothesis testing Step 1 State the hypotheses and identify the claim. Step 2 Find the critical value(s). Step 3 Compute the test value. Step 4 Make the decision. Step 5 Summarize the results.
  • 70.
    Independent Sample T-Test:Equal Variance Assumed The independent- measures hypothesis test allows researchers to evaluate or to compare the mean difference between two populations using the data from two separate samples. Generally, 𝜎2 is unknown and is being estimated from the data. Hence, the t-test is used. If Equal Variances Assume: 𝑠1 2 = 𝑠2 2 𝑡 = (𝑋1−𝑋2)−(𝜇1− 𝜇2) 𝜎1 2 𝑛1 + 𝜎2 2 𝑛2
  • 71.
    Independent Sample T-Test:Equal Variance Assumed It is used for testing two means when the variance is known and T-test if the variance is unknown. If Equal Variances Not Assume: 𝜎1 2 ≠ 𝜎2 2 𝑡 = (𝑋1−𝑋2)−(𝜇1− 𝜇2) 𝜎1 2 𝑛1 + 𝜎2 2 𝑛2 Remember degree of freedom: df=𝑛1 + 𝑛2 − 2
  • 72.
    Example Suppose we putpeople on 2 diets “the fruit diet and the bread diet”. Participants are randomly assigned to either 7 days of eating exclusively fruits or 7 days of exclusively eating bread. At the end of the week, we measure the weight gain of each participant. Does bread diet cause more gain weight compared to fruit diet? Test the claim using 10% level of significance. 𝑿 𝟏∶: 𝑭𝒓𝒖𝒊𝒕 𝑫𝒊𝒆𝒕 3 4 4 4 5 6 𝑿 𝟐:: 𝑩𝒓𝒆𝒂𝒅 𝑫𝒊𝒆𝒕 1 2 2 2 3 4
  • 73.
    ONE WAY ANALYSISOF VARIANCE One way analysis of variance is used when you want to compare the means of more than two groups. This test can be used only if the background assumptions are satisfied such that it has independent random samples, population are normal and population variance are equal. F= 𝑀𝑆 𝐵 𝑀𝑆 𝑤 𝑆𝑆 𝐵 = 𝑛 𝑖=1 𝑘 (𝑦1 − 𝑦)2 𝑎𝑛𝑑 𝑆𝑆 𝑤 = 𝑖 𝑘 𝑖 𝑖=1 𝑘 (𝑦𝑖𝑗 − 𝑦𝑖)2 Where: 𝑀𝑆 𝐵= 𝑆𝑆 𝐵 𝑘−1 𝑀𝑆 𝑤 = 𝑆𝑆 𝑤 𝑁 − 𝑘
  • 74.
    Summary Table forone way-Anova Source Sum of Squares Degrees of Freedom Variance Estimate F ratio Between 𝑆𝑆 𝐵 K-1 𝑀𝑆 𝐵= 𝑆𝑆 𝐵 𝑘−1 Within 𝑆𝑆 𝑤 N-K 𝑀𝑆 𝑤 = 𝑆𝑆 𝑤 𝑁 − 𝑘 𝑀𝑆 𝐵 𝑀𝑆 𝑤 Total 𝑆𝑆𝑡 = 𝑆𝑆 𝐵 + 𝑆𝑆 𝑤 N-1 Where: 𝑆𝑆 𝐵 = 𝑛 𝑖=1 𝑘 (𝑦1 − 𝑦)2 𝑎𝑛𝑑 𝑆𝑆 𝑤 = 𝑖 𝑘 1 𝑖=1 𝑘 (𝑦𝑖𝑗 − 𝑦𝑖)2
  • 75.
    Examples A teacher isconcerned about the level of knowledge possessed by PUP students regarding Philippine history. Students completed a senior high school level standardized history item. Academic major of the students was also recorded. Data in terms of percent correct response is recorded below for 24 hours. Is there a significant difference between the levels of knowledge possessed by PUP students regarding Philippines history grouped when grouped according to their academic major? Compute the appropriate test for the data provided below and used 0.05 level of significance.
  • 76.
    EDUCATION BUSINESS MANAGEMENT BEHAVIORAL SOCIAL SCIENCE ENGINEERING 6372 42 81 79 49 52 57 78 64 30 87 56 68 83 64 67 39 22 29 47 78 71 30
  • 77.
    Computational Procedure 1. Definethe Null and Alternative Hypothesis. ◦ 𝐻 𝑜: 𝐸𝑑𝑢𝑐𝑎𝑡𝑖𝑜𝑛 = 𝐵𝑢𝑠𝑖𝑛𝑒𝑠𝑠 = 𝐵𝑒ℎ𝑎𝑣𝑖𝑜𝑟𝑎𝑙 = 𝐸𝑛𝑔𝑖𝑛𝑒𝑒𝑟𝑖𝑛𝑔 ◦ 𝐻 𝑎: 𝐴𝑡 𝑙𝑒𝑎𝑠𝑡 𝑡𝑤𝑜 𝑜𝑓 𝑡ℎ𝑒 𝑚𝑒𝑎𝑛𝑠 𝑜𝑓 𝐸𝑑𝑢𝑐𝑎𝑡𝑖𝑜𝑛, 𝐵𝑢𝑠𝑖𝑛𝑒𝑠𝑠, 𝐵𝑒ℎ𝑎𝑣𝑖𝑜𝑟𝑎𝑙 ◦ 𝑎𝑛𝑑 𝐸𝑛𝑔𝑖𝑛𝑒𝑒𝑟𝑖𝑛𝑔 𝑎𝑟𝑒 𝑛𝑜𝑡 𝑒𝑞𝑢𝑎𝑙 ◦ 2. State Alpha : 𝛼 = 0.05 ◦ 3. Degrees of freedom = 𝑑 𝑓1 = 𝑘 − 1 = 4 − 1 = 3 (within groups) ◦ 𝑑 𝑓2 = 𝑛 − 𝑘 = 24 − 4 = 20 (between groups) ◦ 3. State Decision Rule ◦ One-tailed Test: 𝑓 > 𝑓𝑎; 𝑅𝑒𝑗𝑒𝑐𝑡 𝐻 𝑜 ◦ Two-Tailed Test: 𝑓 > 𝑓𝑎 2 ; 𝑅𝑒𝑗𝑒𝑐𝑡 𝐻 𝑜
  • 78.
    EDUCATION BUSINESS MANAGEMENT BEHAVIORAL SOCIAL SCIENCE ENGINEERING Total 6372 42 81 79 49 52 57 78 64 30 87 56 68 83 64 67 39 22 29 47 78 71 30 Mean 65.00 61.67 50.00 58.00 58.67
  • 79.
    Computing the Sumof Squares 1. 𝑆𝑆 𝐵 = 𝑛 𝑖=1 𝑘 (( 𝑦 − 𝑦) 2 = 6 65 − 58.67 + 6(61.67 − 58.67)+ 6 50 − 58.67 + 6 58 − 58.67 = 748 2. 𝑆𝑆 𝑤 = 𝑖 𝑘 1 𝑖=1 𝑘 (𝑦𝑖𝑗 − 𝑦𝑖)2 = 778 + 1093.33 + 2782 + 3032 = 7685.333 𝑆𝑆 𝑒𝑑𝑢𝑐𝑎𝑡𝑖𝑜𝑛 = (63 − 65)2+ (79 − 65)2+ (78 − 65)2+ (56 − 65)2+ (67 − 65)2 +(47 − 65)2 =778 𝑆𝑆 𝑏𝑢𝑠𝑖𝑒𝑛𝑒𝑠𝑠 = (72 − 61.67)2+ (49 − 61.67)2+ (64 − 61.67)2+ (68 − 61.67)2+ (39 − 61.67)2 +(78 − 61.67)2 =1093.33 𝑆𝑆𝑠𝑜𝑐𝑖𝑎𝑙 𝑠𝑐𝑖𝑒𝑛𝑐𝑒 = (42 − 50)2+ (52 − 50)2+ (30 − 50)2+ (83 − 50)2+ (22 − 50)2+(71 − 50)2=2782 𝑆𝑆 𝑒𝑛𝑔𝑖𝑛𝑒𝑒𝑟𝑖𝑛𝑔 = (81 − 58)2+ (57 − 58)2+ (87 − 58)2+ (64 − 58)2+ (29 − 58)2+(30 − 58)2=3032
  • 80.
    Summary Table forone way-Anova Source Sum of Squares Degrees of Freedom Variance Estimate F ratio Between 748 3 748 3 = 249.33 Within 7685.3333 20 7685.333 20 = 384.27 𝑀𝑆 𝐵 𝑀𝑆 𝑤 = 249.33 384.27 = 0.6489 Total 𝑆𝑆𝑡 = 8433.33 N-1 Where: 𝑆𝑆 𝐵 = 𝑛 𝑖=1 𝑘 (𝑦1 − 𝑦)2 𝑎𝑛𝑑 𝑆𝑆 𝑤 = 𝑖 𝑘 1 𝑖=1 𝑘 (𝑦𝑖𝑗 − 𝑦𝑖)2
  • 81.
    Computational Procedure 5. StateResults (use t table to get the critical value) 𝑡 − 𝑣𝑎𝑙𝑢𝑒 = 3.86 Computed F-value =0.6489 0.6489 > 3.86, Decision:𝐴𝑐𝑐𝑒𝑝𝑡 𝐻 𝑜 6. Conclusion: Therefore, there is no significant difference between the levels of knowledge possessed by PUP students regarding Philippine history when grouped according to their academic subject.