2. INFERENTIAL STATISTICS
๏ด Inferential Statistics
โข Refers to the statistical procedure used in the drawing of inferences about the
properties of population from sample data.
๏ด Test of Hypothesis
โข It is a statistical tool that determines whether there is a statistically significant
difference between two or more groups, or whether there is a statistically significant
relationship between two or more variables.
๏ด Hypothesis
โข It is a statement or tentative theory which aims to explain a facts about the real world.
โข They are subjected to testing.
๏ If they are found to be statistically true, they are accepted
๏ If they are found to be statistically false, they are rejected
3. INFERENTIAL STATISTICS
๏ด Two Kinds of Hypothesis
1. Null Hypothesis (H0). A hypothesis that may either be rejected or accepted
2. Alternative Hypothesis (Ha). It generally represents the hypothetical statement the
that the researcher wanted to prove.
โข Summary:
๏ REJECTION of H0 implies ACCEPTANCE of Ha
๏ ACCEPTANCE of H0 implies REJECTION of Ha
โข Possible Error when Making Decision about the Proposed Hypothesis- SUMMARY:
TYPE I and TYPE II ERRORS
DECISION H0 = TRUE Actual Condition
Ha = TRUE
Reject H0 Type I Error Correct Decision
Accept H0 Correct Decision Type II Error
โข The probability of making a type 1 error or alpha error in a test is called a significance
level of a test.
4. INFERENTIAL STATISTICS
๏ด Steps in Hypothesis Testing
1. Formulate the null hypothesis (H0) that there is no significant difference between
items being compared.
2. Set the level of significance.
3. Determine the test to be used.
4. Determine the tabular value for the test.
5. Compute for z-test or t-test as needed.
๏ด z โ test
1. Sample mean compared with Population mean
FORMULA:
z =
X โ ยต
ฯ
๐
where: z = z-test
X = sample mean
ยต = population mean
ฯ = population standard deviation
n = number of items within the sample
5. INFERENTIAL STATISTICS
2. Comparing two sample mean
FORMULA: Z =
X1 โ X2
ฯ 1
n1
+
1
n2
where: z = z-test
X1 = mean of the first sample
X2 = mean of the second sample
n1 = number of items in the second sample
ฯ = population standard deviation
3. Comparing two sample proportions
FORMULA:
where: P1 = proportion of the first sample
q1 = 1 - P1
P2 = proportion of second sample
q2 = 1 - P2
n1 = number of items in the first sample
n2 = number of items in the second sample
๐ง =
P1 โ
P2
P1q1
n1
+
P2q2
n2
6. INFERENTIAL STATISTICS
๏ด EXAMPLE 1
Data from a school census show that the mean weight of college students was 45 kilos,
with a standard deviation of 3 kilos. A sample of 100 college students were found to have
a mean weight of 47 kilos. Are the 100 college students really heavier than the rest, using
.05 significance level?
Step 1: H0 : The 100 college students are not really heavier that the rest. (X = 45 kilos)
Ha : The 100 college students are really heavier that the rest. (X > 45 kilos)
Step 2: Set 0.05 level of significance
Step 3: The standard deviation given is based on the population. Therefore the z-test is to be
used
Step 4: Based on the table (critical value of z), the tabular value of z for one tailed test at 0.05
level of significance is 1.645
Step 5: The given values in the problem are:
X = 47 kilos
ยต = 45 kilos
ฯ = 3 kilos
n = 100
7. INFERENTIAL STATISTICS
FORMULA: z =
X โ ยต
ฯ
๐
z =
47 โ45
3
100
=
2
3
10
=
2
0.3
= 6.67
Step 6: The computed value of 6.67 is greater than the tabular value 1.645. Therefore, the null
hypothesis is rejected
8. INFERENTIAL STATISTICS
๏ด EXAMPLE 2
A researcher wishes to find out whether or not there is significant difference between the
monthly allowance of morning and afternoon students In his school. By random
sampling, he took a sample of 239 students in the morning session. These students were
found to have a mean of monthly allowance of โฑ142.00. The researcher also took a
sample of 209 students in the afternoon session. They were found to have a mean
monthly allowance of โฑ148.00. The total population of students in that school has a
standard deviation of โฑ40. Is there a significant difference between the two samples at
0.01 level of significance?
H0 : There is no significant difference between the samples
Ha : There is significant difference between the samples
FORMULA: Z =
X1 โ X2
ฯ 1
n1
+
1
n2
9. INFERENTIAL STATISTICS
Z =
142 โ148
40 1
239
+
1
209
=
โ6
40
0.0042 +0.0048
=
โ6
40
0.0090
=
โ6
40 (0.095)
=
โ6
3.8
= โ1.579
The computed value of 1.579 is less than the tabular value of 2.58 at 0.01 level of significance.
Accept the null hypothesis
10. INFERENTIAL STATISTICS
๏ด EXAMPLE 3
A sample survey of a television program in Metro Manila shows that 80 of 200 men
dislike the same program. We want to decide whether the difference between the two
sample proportion,
80
200
= 0.40 and
75
250
= 0.30, is significant or not at 0.05 level of
significance.
H0 : There is no significant difference between the two sample proportions
Ha : There is significant difference between the two sample proportions
The given values in the problem are:
P1 = 0.40 q1 = 1 - P1 = 1 โ 0.40 = 0.60
P2 = 0.30 q2 = 1 - P2 = 1 โ 0.30 = 0.70
n1 = 200 n2 = 250
๐ง =
P1 โ
P2
P1q1
n1
+
P2q2
n2
๐ง =
0.40 โ 0.30
(0.40) (0.60)
200
+
(0.30) 0.70)
250
=
0.10
0.24
200
+
0.21
250
=
0.10
0.0012 + 0.00084
12. INFERENTIAL STATISTICS
๏ด t โ test
1. Sample mean compared with Population mean
FORMULA: t =
X โ ยต
๐
๐ โ1
or t =
X โ ยต ๐ โ1
๐
where: t = t-test
X = sample mean
ยต = population mean
s = sample standard deviation
n = number of items in the sample
EXAMPLE: A researchers knows that the average height of Filipino women is 1.525
meters. A random sample of 26 women was taken and was found to have a mean
height of 1.56 , with standard deviation of .10 meters. Is there reason to believe that
the 26 women in the sample are significantly taller than the others .05 significance
level?
H0 : The sample is not significantly taller than the other Filipino women
Ha : The sample is significantly taller than the others
13. INFERENTIAL STATISTICS
The given values in the problem are:
X = 1.56 meters
ยต = 1.525 meters
s = .10 meters
n = 26
degrees of freedom = n โ 1
= 26 โ 1
= 25
FORMULA: t =
X โ ยต
๐
๐ โ1
t =
1.56 โ1.525
.10
26 โ1
t =
0.035
.10
25
t =
0.035
0.02
t = 1.75
The computed value of 1.75 is greater than the tabular value 1.708, the null hypothesis is
rejected
14. INFERENTIAL STATISTICS
2. Comparing two sample means
FORMULA: t =
X1 โ X2
(๐1โ1) ๐ 1
2+(๐2โ1) ๐ 2
2
๐1+ ๐2 โ2
1
๐1
+
1
๐2
where: t = t-test
X1 = mean of the first sample
X2 = mean of the second sample
๐ 1 = standard deviation of the first sample
๐ 2 = standard deviation of the second sample
๐1 = number of items in the first sample
๐2 = number of items in the second sample
15. INFERENTIAL STATISTICS
EXAMPLE: A teacher wishes to test whether or not the Case Method of teaching is more effective that
the Traditional Method. She picks two classes of approximately equal intelligence (verified trough an
administered IQ test). She gathers a sample of 18 students to whom she uses the Case Method and
another sample of 14 students to whom she uses the Traditional Method. After the experiment, an
objective tests revealed that the first sample got a mean score of 28.6 with a standard deviation of 5.9,
while the second group got a mean score of 21.7 with a standard deviation of 4.6. Based on the result
of the administered test, can we say that the Case Method is more effective that the Traditional
Method?
H0 : The Case Method is as effective as the Traditional Method
Ha : The Case Method is more effective that the Traditional Method
Given:X1 = 28.6 X2 = 21.7
๐ 1 = 5.9 ๐ 2 = 4.6
๐1 = 18 ๐2 = 14
degrees of freedom = ๐1 + ๐2 - 2
= 18 + 14 โ 2
= 30
16. INFERENTIAL STATISTICS
FORMULA: t =
X1 โ X2
(๐1โ1) ๐ 1
2+(๐2โ1) ๐ 2
2
๐1+ ๐2 โ2
1
๐1
+
1
๐2
t =
28.6 โ21.7
18โ1 5.9 2+(14โ1) 4.6 2
18 +14 โ2
1
18
+
1
14
t =
6.9
17 (34.81)+(13) (21.16)
32 โ2
0.06+0.07
t =
6.9
591.77+275.08
30
0.13
t =
6.9
28.895 0.13
t =
6.9
3.756
t =
6.9
1.94
t =3.56
The computed t-value of 3.56 is greater than the tabular value 1.697, therefore the null
hypothesis is rejected
17. INFERENTIAL STATISTICS
๏ด Analysis of Variance (ANOVA)
FORMULA:
F =
MSS๐
MSS๐ค
โข ANOVA is based upon two sources of variation โ (1) the between โ column variance;
(2) the within โ column variance
โข The two variance was sometimes called as between โ column sum of squares (๐๐๐)
and the within โ column sum of squares (๐บ๐บ๐ค). The sum of the two variances make up
the total sum of squares (TSS).
FORMULA: TSS = ๐ฅ2 โ
( ๐ฅ)2
๐
where: x = refers to the value of each entry
N = refers to the total number of items
EXAMPLE: Let us take three groups of 6 students each, where each group is subjected to
one of three types of teaching method. The grades of the students are taken at the end
of the semester and enumerated according to grouping. The one way classification
model will look like this:
19. INFERENTIAL STATISTICS
โข The total sum of squares is computed as follows:
TSS = 136, 484 -
(1,560)2
18
= 136, 484 -
2,433,600
18
= 136, 484 - 135, 200
= 1, 284
โข The between-column variance or between-column sum of squares is of the sum of the squares
of the column sum, minus the correction term, where r refers to the number of rows.
1
๐
๐๐๐ =
1
๐๐. ๐๐ ๐ ๐๐ค๐
(๐ ๐ข๐ ๐๐ ๐๐๐โ ๐๐๐๐ข๐๐)2
โ
( ๐ฅ)
2
๐
๐๐๐ =
1
6
(5342
+ 4652
+ 5612
) โ
(1, 560)2
18
๐๐๐ =
1
6
(285, 156 + 216, 255 + 314, 721) โ
(2, 433, 600)
18
๐๐๐ =
816, 132
6
โ
(2, 433, 600)
18
๐๐๐ = 136, 022 โ 135, 200
๐๐๐ = 822
TSS = ๐ฅ2
โ
( ๐ฅ)2
๐
20. INFERENTIAL STATISTICS
โข The within-column variance or within-column sum of squares is the difference between the total
sum of squares and the between-column sum of squares.
๐๐๐ค = TSS โ SS๐
= 1, 284 โ 822
= 462
โข We can make use any of the following in getting the โdegrees of freedomโ:
Total degress of freedom (df) = N โ 1
= 18 โ 1
= ๐๐
Total degress of freedom (df) = rk โ 1
= (3 x 6) โ 1
= 18 โ 1
= ๐๐
Between-column df = Number of Columns โ 1
= 3 โ 1
= ๐
Within-column df = total df โ between column df
= 17 โ 2
= ๐๐
21. INFERENTIAL STATISTICS
โข To compute for the Mean Sum of Squares
๐๐๐๐ =
SS๐
df๐
=
822
2
= 411
๐๐๐๐ค =
SS๐ค
df๐ค
=
462
15
= 30.8
โข To compute for the F-test
F =
MSS๐
MSS๐ค
F =
411
30.8
= 13.34
22. INFERENTIAL STATISTICS
โข ANOVA Table on the Three Samples Subjected to Different Teaching Method
โข The tabular value: 3.68 at 5% level of significance
โข DECISION: The null hypothesis is rejected considering that the computed value of 13.34 is greater
than the tabular value of 3.68
23. INFERENTIAL STATISTICS
๏ด Chi-square Test (๐ฟ๐
)
โข Use of Chi-square Test
1. For estimating how closely an observed distribution matches an expected
distribution, also known as good-of-fit test.
2. For estimating whether two random variables are independent, also called as test
of independence.
FORMULA: For Good-of-Fit Test
๐2
=
(OF โ EF)2
EF
FORMULA: For Test of Independence
๐2
=
(OF โ EF)2
EF
EF =
Row Total x Column Total
n
24. INFERENTIAL STATISTICS
๏ด EXAMPLE
โข Chi-square for a Good-of-Fit Test
๏ผ A two dice (dice A and B) with six sides was rolled out 10 times with a chance of any
particular number coming out was the same: 1 in 6. If the dice is loaded, there are certain
numbers which will have a greater chance of appearing, while others will have a lower
chance. The researcher observed the following in one dice (A).
๐ฅ2
=
(18โ10)2
10
+
(5โ10)2
10
+
(9โ10)2
10
+
(7โ10)2
10
+
(5โ10)2
10
+
(16โ10)2
10
๐ฅ2 = 6.4 + 2.5 + 0.1 + 0.9 + 2.5 + 3.6
๐ฅ2 = 16 CONCLUSION: There is very low chance that these rolls came from a fair dice considering that the calculated
value of 16 is greater that the tabular value of 11.07. This means that there is statistically significant difference
between the two dices.
25. INFERENTIAL STATISTICS
โข Chi-square Test for Independence
โข Test the hypothesis that academic performance does not depend on IQ at 1% significance level.
โข Degrees of Freedom (df) = (r โ 1) (k โ 1)
= (2 โ 1) (3 โ 1)
= 2
โข COMPUTATION: Getting the ๐ ๐
๏ผ Where ๐๐ = 31, ๐๐ =
32 x 80
100
= 25.6
๏ผ Where ๐๐ = 1, ๐๐ =
32 x 20
100
= 6.4
๏ผ Where ๐๐ = 45, ๐๐ =
49 x 80
100
= 39.20
๏ผ Where ๐๐ = 4, ๐๐ =
49 x 20
100
= 9.80
๏ผ Where ๐๐ = 4, ๐๐ =
19 x 80
100
= 15.2
๏ผ Where ๐๐ = 15, ๐๐ =
19 x 20
100
= 3.80
26. INFERENTIAL STATISTICS
โข Replacing the above values into the chi-square formula, we shall have:
๐2 =
(OF โ EF)2
EF
๐2
=
(31 โ25.6)2
25.6
+
(1 โ6.4)2
6.4
+
(45 โ39.2)2
39.2
+
(4 โ9.80)2
9.80
+
(4 โ15.2)2
15.2
+
(15 โ3.80)2
3.80
๐2 =
29.16
25.6
+
29.16
6.4
+
33.64
39.2
+
33.64
9.80
+
125.44
15.2
+
125.44
3.80
๐2 = 1.139 + 4.556 + 0.858 + 3.433 + 8.253 + 33.011
๐2 = 51.25
โข Since the computed Chi-square value of 51.25 is greater than the tabular value of 9.21, the null
hypothesis is rejected. For the 100 students, academic performance depends on IQ
27. INFERENTIAL STATISTICS
๏ด Simple Regression Analysis
โข Regression analysis is concerned with the problem of estimation and forecasting.
๏ผ TYPES OF RELATIONSHIP
1. Direct Relationship. The slope of the line is positive because Y increases as X also
increases
2. Inverse Relationship. The slope of the line is negative because Y decreases as X also
increases
โข Least Square Regression Line or LSRL is a statistical technique that analyses the
relationship between the independent and dependent variables.
๏ผ EQUATION:
Y = a + bX
๏ผ NORMAL EQUATIONS:
1. ฮฃY = aN + bฮฃX
2. ฮฃXY = aฮฃX + bฮฃ๐ฟ๐
28. INFERENTIAL STATISTICS
WHERE: ฮฃY = sum of the values of Y, the dependent variable
N = the number of pairs of X and Y
ฮฃX = sum of the values of X, the independent variable
ฮฃXY = the sum of the column XY, which is derived by multiplying the paired values of X and Y
ฮฃ๐๐
= the sum of the column ๐๐
is derived by squaring the values of X
โข Based on the given data of X and Y, we can determine all of the above which means
that the two normal equations now consist of a system of two linear equations with
two unknowns โ a and b
โข FORMULAS:
a=
๐บ๐ ๐บ๐๐ โ ๐บ๐ (๐ฎ๐ฟ๐)
๐ ๐ฎ๐๐ โ(๐ฎ๐)๐
b=
๐ ๐ฎ๐๐ โ ๐ฎ๐ (๐ฎ๐)
๐ ๐ฎ๐๐ โ (๐ฎ๐)๐
30. INFERENTIAL STATISTICS
๏ด Simple Correlation Analysis
โข Correlation analysis concerned with the relationship in the changes of such variables.
โข Degrees of correlation or relationship between two variables
1. Perfect correlation (negative and positive)
2. Some degrees of correlation (negative and positive)
3. No correlation
โข The concept of correlation in terms of computed value is called correlation coefficient.
The value of the correlation coefficient ranges from -1 to +1.
โข Pearson r test. The Pearson Product-Moment Coefficient of Correlation, otherwise
known as Pearson r is the most commonly used correlation coefficient.
โข Pearson r, as the most widely used measure of correlation has two basic assumptions,
to wit:
1. The existence of linear relationship; and
2. The level of measurement of the data for the two variables are either in interval or
ratio scale.
31. INFERENTIAL STATISTICS
โข The value of r (degree of linear relationship) can be interpreted according to the use of
range of values for the Pearson Product Moment of Correlation Coefficient, as follows:
โข Notably, Pearson r is not a measure of causality. The significant of the obtained
correlation coefficient can be determined through the use of t-test for testing the
significance of r.
๐ญ = ๐ซ
๐ง โ ๐
๐ โ ๐ซ๐
WHERE: t = t-test
r = obtained Pearson r value
n = paired sample size
FORMULA:
Degree of freedom = n โ 2
32. INFERENTIAL STATISTICS
FORMULA for Pearson r:
๐ =
๐ ๐บ๐๐ โ ๐บ๐ (๐บ๐)
๐ ๐ฎ๐๐ โ (๐ฎ๐)๐ ๐ ๐ฎ๐๐ โ (๐ฎ๐)๐
Where: r = correlation coefficient
N = total number of pair variables
X = the first variable under study
Y = the second variable under study
EXAMPLE:
A researcher wants to find out about the relationship between the performance of a sample
of five Peace and Security students in Political Science and Peace Security subjects:
34. INFERENTIAL STATISTICS
โข Thus, there is moderate negative relationship between the performance of a sample of five Peace
and Security students in Political Science and Peace Security subjects
โข The significance of t-value to determine whether to reject ๐๐ and accept ๐๐ or otherwise, thus the
researcher can generalize whether there is direct, indirect or no correlation between variables.
๏ผ Computed t value = -1.30
๏ผ Critical value of t at 0.05 level of significance = 2.353
๏ผ If computed t-value > critical value of t = REJECT ๐๐
If the computed t-value < critical value of t = ACCEPT ๐๐
๏ผ CONCLUSION: Since the computed t-value is lesser than the critical value of t, the null
hypothesis, is ACCEPTED
โข Hence, we can say that the performance of the five students of Peace and Security in Political
Science and Peace Security subjects had a moderate negative correlation with no significant
relationship exist between the said variables.
Editor's Notes
NOTE 1: (Definition of Inferential). Inferential statistics demands a higher order of critical judgment and mathematical methods. It aims to give information about large groups of data without dealing with each and every element of these groups. It uses only a small portion of the total set of data in order to draw conclusions or judgments regarding the entire set.
NOTE 2: (Test of Hypothesis). It is a procedure used to substantiate or invalidate a claim which is stated as a null hypothesi.s
NOTE 1 (NULL HYPOTHESIS). The hypothesis is ACCEPTED when the difference or relationship found are due to chance variations. This means that the independent variable had no effect on dependent variable or that the two means are not statistically different.
NOTE 2 (NULL HYPOTHESIS). The hypothesis is REJECTED when the difference or relationship is too large to have occurred due to chance. It means that there exists a real relationship or difference between two variables in the populations.
NOTE (TYPE 1 AND TYPE II ERROR). Type I error (alpa-error) โ when we reject the null hypothesis (action) when in fact the null hypothesis or H0 is true (actual condition) and therefore the alternative hypothesis or Ha is false. Type II error (beta-error) - when we accept the null hypothesis (action) when in fact the null hypothesis is false (actual condition) and therefore the alternative hypothesis or Ha is true.
NOTE ON ITEM NO. 3. Use z-test if population standard deviation is given, and t-test if the standard deviation given is from the samples
NOTE ON ITEM NO. 4. For z-test use the table of the critical values of z based on the area of the normal curve. For t-test, one must first compute for the degrees of freedom, then look for the tabular value from the table of t-distribution. In getting the degrees of freedom is, for a single sample, df = number of items โ 1 (df = n โ 1). For two samples the formula is df = n1 + n2
NOTE 1 (ANOVA): Is a technique in inferential statistics designed to test whether or not more than two samples are significantly different from each other.
NOTE 1: (CHI-SQUARE TEST): Chi-square is a versatile statistical test named after the chi-square distribution which is derived under the assumption of normality of the population. It is used to compare the observed proportion of observations falling into different categories (observed frequencies) with the proportion that would occur by chance (expected frequencies)
NOTE: With the distribution, it appears that 1โs and 6โs came out more than they are expected to come out. The other came out fewer than expected. Thus, the differences occurred by chance. Using the Chi-square test, the researcher can estimate the likelihood that the values observed in said Dice A occurred by chance. The idea of the chi-square for good-of-fit test is to compare the observed and expected values. There were six terms in the above table. The number of the degrees of freedom is five (number of terms minus one).
NOTE1: To make forecasting, one must rely on the relationship between what is already known and what is to be estimated.
NOTE2: Regression analysis determine both the nature and strength of a relationship between two variables. The known variables is called independent variable (denoted as X) and the dependent variable (denoted as Y)
NOTE3: LSRA is a statistical tool that analyses the relationship between the independent and dependent variables.
NOTE4: LSRL: The term โLeast Squareโ means that the most accurate trend line that may be drawn is one where the sum of the squares of the vertical distances of the points from the line is least or minimum. All other lines will yield a higher result. This is the same as saying that the sum of the vertical distances of the points above the line should be equal to the sum of the vertical distances of the points below the line. When these sums (above and below) are not equal, then the sum of the squares of the vertical distances of all points from the line is not minimum.Type equation here.
NOTE5: (EQUATION): Therefore, if we kow the a and b in the equation, we can solve for Y for any given value of X. The method using the LSRL is reduced to finding the equation of the trend line which in turn is found by solving for a and b in the equation. The formulas for a and b are derived from what are referred to as โNORMAL EQUATIONSโ.
NOTE1: (BASED ON..): From Algebra, we known that under such a system we can solve for the values of the two unknowns (a and b) by employing any of the following methods: 1. Substitution; 2. Elimination; and 3. Determinants
NOTE2: In both formulas, we all need to know ๐บ๐, N, ๐บ๐, ๐บ๐๐, and ๐ฎ๐ ๐
NOTE1: Positive Correlation relates two variables whose values are both increasing while Negative Correlation describes a situation where as one variable increases, the other variable decreases.
NOTE2: -1 signifies perfect negative correlation while +1 indicates perfect positive correlation. These in-between values, except zero, indicate some degree of correlation, whether positive or negative. A correlation coefficient of 0 indicates no correlation at all.
NOTE3: PEARSON r. It is used to describe or measure the closeness of the relationship between the two variable.