2. INTRODUCTION
There are many instances in your life when you try to
determine if some characteristics are related with
each other. On a higher level, you also want to
measure the degree of their relationship or
association. You usually associate height and weight,
budget and expenses and other aspects in life which
may be related with one another.
3. The Scatter Diagram
Plotting graphically the values of the correlated variables means
placing one variable on the x-axis and the other on the y-axis The
scatter diagram gives you a picture of the relationship between
variables.
4. Example of a Scatter Diagram
0
5
10
15
20
25
30
35
0 10 20 30 40 50 60
Grades
5. Example of a Scatter Diagram
0
5
10
15
20
25
30
35
0 10 20 30 40 50 60
Grades
6. Example of a Scatter Diagram
0
5
10
15
20
25
30
0 10 20 30 40 50 60
Grades
7. Example of a Scatter Diagram
0
5
10
15
20
25
0 10 20 30 40 50 60
Grades
8. Types of Correlation
1. Simple Correlation
β¦ This is a relationship between two variables. The relationship between an
independent variable and a dependent variable is usually measured.
β¦ A. Linear Correlation
β¦ This means that a change in one variable is at a constant rate with respect to
the change in the second variable. The correlation between the variables
may either be showing direct or inverse relationship.
9. Types of Correlation
2. Curvilinear Correlation
β¦This means that a change in one variable is not at a fixed rate. It
may be increasing or decreasing with respect to the change in the
other variable.
10. The Coefficient of Correlation
To obtain the quantitative value of the extent of the
relationship between two sets of items, it is necessary to
calculate the correlation coefficient.
The values of the coefficient correlation ranges between +1 to
-1.
Zero represents no relationship.
11. The Pearson Product Moment
Correlation Coefficient (Pearson r)
It is derived by Karl Pearson.
It measures the linear relationship between two variables.
Therefore, to be able to determine linearity, it is important that a
scatter diagram be constructed prior to the computation of the
Pearson r.
13. Example 1:
The scores of ten randomly selected senior high school
students on the mathematical portion of the National
Admission Test (NAT) and the mathematical ability
14. Find the coefficient of correlation of the
following
STUDENT
NAME
X Y πΏ π
π π XY
A 5 6
B 7 15
C 9 16
D 10 12
E 11 21
F 12 22
G 15 8
TOTAL π=____ π =___ π2=___ π2=____ ππ =___
16. SPEARMAN RANK ORDER COEFFICIENT
OF CORRELATION
The statistics being used on ranks or position is the Spearman Rank Correlation
Coefficient represented here by π π. It is a measure of relationship between two
variables by ranking the items or individuals under study according to their
position. It represents the extent to which the same individuals or events occupy
the same relative position on two variables.
Formula: π π = π β
π π« π
π(π πβπ)
where: π π = Spearman rank correlation coefficient
β¦ D = difference between the two ranks of an
individual in the variables studied.
β¦ n = number of individuals
19. SIMPLE LINEAR REGRESSION ANALYSIS
Linear regression is the simplest and commonly used statistical
measure for prediction studies. It is concerned with finding an
equation that uses the known values of one or more variables,
called the independent or predictor variables, to estimate the
unknown value of quantitative variable called the dependent or
criterion.
It is a prediction when a variable (Y) is dependent on a second
variable (X) based on the regression equation of a given set of data.
20. Three major uses of regression analysis
1. Causal analysis βestablishes the possible causation of changes in one variable by
changes in other variable.
2.Forecasting an Effect βpredicts or estimate the value of a variable given the
values of other variable.
3. Linear Trend Forecasting βimposes a line best fit to time series historical model.
The general form of the linear function is π = π + ππ₯
Where: a = is called the Y-intercept of the line
β¦ b= is the slope of the line called regression (the rate of change of Y per unit change
in X)
21. Example
6 randomly selected Grade 11 students took a 50-item mathematics
aptitude test before they began their course in Statistics and Probability
subjects.
1. What linear equation best predicts performance(based on first grading
test scores) in Statistics and Probability based on performance in the
mathematics aptitude?
2. If a student made a score of 45 on the math aptitude test, what score
would we expect the student to obtain in Statistics and Probability.
3. How well does the regression equation fit the data?
26. Consider the table below. The test scores in statistics and
probability of Grade 11 students in Mainit NHS. Find the
equation of the regression line then predict the grades in
statistics and probability if the test scores are 60 and 75.
RUBRICS 1 RUBRICS 2 Procedures
27. Activity 1: Seek ye First
25
100
75
50
y
X
302010 40 50 60 8070 90
Grades
Test Scores
28. Guide Questions:
β¦What did you represent to vertical axis?
Horizontal axis?
β¦What is your process in plotting points on the
x-y plane
β¦Base on the results, describe the diagram
formed by the points plotted.
Procedures 2
31. Activity 1: Seek ye First
25
100
75
50
y
X
302010 40 50 60 8070 90
Grades
Test Scores
xy 39.003.64 ο«ο½
Guide Questions
32. Guide Questions
1. What is the value of Ξ± as the intercept?____
2. What is the value of Ξ² as the slope of the line?______
3. Write the regression equation.___________
4. State the relationship between the grades in statistics y and scores in the test
x._______ Why? Explain mathematically.
β¦ The Grades is directly proportional to Scores. It is because the slope Ξ² > 0 or
the slope is positive.
64.03
0.39
xy 39.003.64 ο«ο½
33. 5. Give your interpretation about the relationships between
x and y variables base on the results.
β¦In every increase of the score by 1, there is a
corresponding increase of grade by 0.39
6. Predict your grades if you got a score of 60, a
score of 75. __________, ____________
87.43 93.28
34.
35. CHI-SQUARE(π₯2
)
The Chi square is the most commonly used method of comparing
proportions. It is particularly useful in tests evaluating a relationship
between nominal or ordinal data. Typical situations or settings are
cases where persons, events or objects are grouped in two or more
nominal categories such as βYes-Noβ responses, βFavor-Against-
Undecidedβ or class βA, B, C or Dβ.
36. CHI-SQUARE(π₯2
)
Chi-square analysis compares the observed frequencies of the responses with
the expected frequencies. It is a measure of actual divergence of the observed
and expected frequencies. It is given by the formula:
π =
Ξ£(πΉπ β πΉπ)2
πΉπ
Where: πΉπ = πππ πππ£ππ ππ’ππππ ππ πππ ππ
πΉπ = ππ₯ππππ‘ππ ππ’ππππ ππ πππ ππ
πΉπ =
(πππ€ π‘ππ‘ππ)(ππππ’ππ π‘ππ‘ππ)
π(πππππ π‘ππ‘ππ)
37. Illustration
Consider the nomination of three (3) presidential candidates of a political party. A,
B and C. The chairman wonders whether or not they will be equally popular among
the members of the party. From this the hypothesis of equal preference, a random
sample of 315 were selected and interviewed which one of the three candidates
they prefer.
The following are the results of the survey:
Candidates Frequency
A 98
B 115
C 102
38. Calculating the π2
π£πππ’π
Candidate π πΆ π π
A 98 105
B 115 105
C 102 105
π2
=
Ξ£(πΉπ β πΉπ)2
πΉπ
π =
(98β105)2
105
+
(115β105)2
105
+
(102β105)2
105
=1.505
39. For chi square significance, use the table
value
Critical value = 5.991
Decision rule: Reject π» π ππ π πππππ’π‘ππ >
5.991, ππ‘βπππ€ππ π ππ πππ‘ ππππππ‘ π» π
Conclusion: Since 1.505<5.991, do not reject π» π.
There is no sufficient evidence or reason to reject the null
hypothesis that the frequencies in the population are equal.
40. Chi-Square as a Test Independence: Two Variables
Chi-Square can also be used to test the significance of relationship
between two variables when data are expressed in terms of
frequencies of joint occurrence.
πΉπ =
(πππ€ π‘ππ‘ππ)(ππππ’ππ π‘ππ‘ππ)
π(πππππ π‘ππ‘ππ)
41. Test of Relationship
Chi-Square Test for Independence.
β¦This is used when data are expressed in terms of frequencies or
percentage(nominal variable).
β¦Formula:
β¦π₯2
=
(πβπΈ)2
πΈ
[df=(r-1)(c-1)
β¦Where: πΈ =
(πππ€ π‘ππ‘ππ)(ππππ’ππ π‘ππ‘ππ)
πππππ π‘ππ‘ππ
42. Example
Suppose one want to know if there is a relationship between gender
and school choice. A sample of 100 female and 100 male freshman
students were asked individually for their school choice. Test the null
hypothesis of no significant relationship between the students gender
and school choice at 5% level of significance.
46. One Sample z-Test
This test is used when we have a random sample and we
want to test. If it is significantly different from a population
mean or we compared a single sample mean( π) to a known
or hypothesized population mean(π). This test can be used
only if the background assumptions are satisfied such as
Sample observations
47.
48. ONE SAMPLE Z-TEST formula
π§ =
πβπ π
π
π
where: π =
(π₯β π₯)2
πβ1
β¦ π = sample mean
β¦π π= population mean
β¦π =population standard deviation
β¦N-number of samples
49. Example:
A company who make cookies, claims that its product
have a mean life span of 7 days with standard
deviation of 2 days. If a random sample of 50 cookies
is tested and one found to have a mean life span for
only 4 days. Test the claim at the 5% level of
significance.
51. Computational Procedure
4. Calculate Test Statistic
π§ =
πβπ π
π
π
=
4β7
2
50
= -10.6066
5. State Results (use z table to get the critical value)
π§ π
2
β
π0.05
2
β π0.025 = 1.96
β10.6066 > 1.96, π·ππππ πππ: π πππππ‘ π» π
6. Conclusion: Therefore the company who makes cookies
have mean life span of not equal to 7 days.
52. Example :1
A researcher wishes to see if the mean number of days
that a basic, low-price, small automobile sits on a dealerβs
lot is 29. A sample of 30 automobile dealers has a mean
of 30.1 days for basic, low-price, small automobiles. At a
0.05, test the claim that the mean time is greater than 29
days. The standard deviation of the population is 3.8
days.
53. Example :2
The Medical Rehabilitation Education Foundation reports that
the average cost of rehabilitation for stroke victims is $24,672. To
see if the average cost of rehabilitation is different at a particular
hospital, a researcher selects a random sample of 35 stroke
victims at the hospital and finds that the average cost of their
rehabilitation is $26,343. The standard deviation of the
population is $3251. At a 0.01, can it be concluded that the
average cost of stroke rehabilitation at a particular hospital is
different from $24,672?
54. ONE SAMPLE T-TEST
The One sample t-test is used when we want to know whether the
difference between a sample mean and the population mean is large
enough to be statistically significant, that is if unlikely to have occurred
by chance.
This test can be used only if the background assumptions are satisfied
such as the population mean and standard deviation must be known and
the test statistics should follows a normal distribution.
55. ONE SAMPLE T-TEST formula
π‘ =
πβπ π
π
π
where: π =
(π₯β π₯)2
π
β¦ π = sample mean
β¦π π= population mean
β¦π=population standard deviation
β¦N=population
β¦N-sample
56. Example
A random sample of 10 grade 11 students has grades in English, where
marks range from 1 (worst) to 6 (excellent). The grade point average
(GPA) of all grade 11 students as of the last six years is 4.5. Is the GPA of
the 10 grade 11 students different from the populations GPA? Use 0.05
level of significance.
Student 1 2 3 4 5 6 7 8 9 10
Grade Points 5 6 4.5 5 5 6 5 5 5 5.5
58. Computational Procedure
4. Calculate Test Statistic
π‘ =
πβπ π
π
π
=
5.2β4.5
0.4831
10
=4.583
5. State Results (use z table to get the critical value)
π‘ π
2
πβ1
β
π‘0.05
10β1
β π‘0.0025 = 2.263
4.583 > 2.262, π·ππππ πππ: π πππππ‘ π» π
6. Conclusion: Therefore the grade point average of the 10
pupils is different from the populations GPA
59. Example 1
The average depth of the Hudson Bay is 305 feet. Climatologists
were interested in seeing if the effects of warming and ice melt
were affecting the water level. Twenty-five measurements over a
period of weeks yielded a sample mean of 306.2 feet. The
population variance is known to be 3.57. Can it be concluded at
the 0.05 level of significance that the average depth has
increased? Is there evidence of what caused this to happen?
60. Example 2
A physician claims that joggersβ maximal volume oxygen
uptake is greater than the average of all adults. A sample
of 15 joggers has a mean of 40.6 milliliters per kilogram
(ml/kg) and a standard deviation of 6 ml/kg. If the average
of all adults is 36.7 ml/kg, is there enough evidence to
support the physicianβs claim at a 0.05?
61. Example 3
The average local cell phone call length was reported to be 2.27
minutes. A random sample of 20 phone calls showed an average
of 2.98 minutes in length with a standard deviation of 0.98
minute. At a 0.05 can it be concluded that the average differs
from the population average?
62. Independent Sample z-test: Equal
Variance Not Assumed
It is used for testing two means when the variance is
known and T-test if the variance is unknown.
If Equal Variances Assume: π1
2
= π2
2
= Ο
π§ =
(π1βπ2)β(π1β π2)
π1
2
π1
+
π2
2
π2
63. Independent Sample z-test: Equal
Variance Not Assumed
It is used for testing two means when the variance is
known and T-test if the variance is unknown.
If Equal Variances Not Assume: π1
2
β π2
2
π§ =
(π1βπ2)β(π1β π2)
π1
2
π1
+
π2
2
π2
64. The basic format for hypothesis testing
Step 1 State the hypotheses and identify the claim.
Step 2 Find the critical value(s).
Step 3 Compute the test value.
Step 4 Make the decision.
Step 5 Summarize the results.
65. Example
Employees at public universities work 11.3 hours per week on
the average with a standard deviation of 9.5. At private
universities, the average working time for employees is 9.7
hours, with a standard deviation of 8.9 hours. The sample size for
each is 500. Is there a significant difference between the average
hours of the public and private universities? Perform a
hypothesis testing using 5% level of significance to find out.
67. Computational Procedure
4. Calculate Test Statistic
π§ =
(π1βπ2)β(π1β π2)
π1
2
π1
+
π2
2
π2
=
11.3β9.7 β0
9.5
500
+
8.9
500
=1.9444
5. State Results (use z table to get the critical value)
π‘ π
2
πβ1
β
π‘0.05
10β1
β π‘0.0025 = 1.96
1.9444 < 1.96, π·ππππ πππ βΆ π΄πππππ‘ π» π
6. Conclusion : Therefore, there is no significant difference between the
average hours of the public and private universities.
68. Example 2
A survey found that the average hotel room rate in New Orleans is
$88.42 and the average room rate in Phoenix is $80.61. Assume that the
data were obtained from two samples of 50 hotels each and that the
standard deviations of the populations are $5.62 and $4.83 respectively.
At a 0.05, can it be concluded that there is a significant
difference in the rates?
69. The basic format for hypothesis testing
Step 1 State the hypotheses and identify the claim.
Step 2 Find the critical value(s).
Step 3 Compute the test value.
Step 4 Make the decision.
Step 5 Summarize the results.
70. Independent Sample T-Test: Equal
Variance Assumed
The independent- measures hypothesis test allows researchers to
evaluate or to compare the mean difference between two populations
using the data from two separate samples. Generally, π2
is unknown
and is being estimated from the data. Hence, the t-test is used.
If Equal Variances Assume: π 1
2
= π 2
2
π‘ =
(π1βπ2)β(π1β π2)
π1
2
π1
+
π2
2
π2
71. Independent Sample T-Test: Equal
Variance Assumed
It is used for testing two means when the variance is
known and T-test if the variance is unknown.
If Equal Variances Not Assume: π1
2
β π2
2
π‘ =
(π1βπ2)β(π1β π2)
π1
2
π1
+
π2
2
π2
Remember degree of freedom: df=π1 + π2 β 2
73. ONE WAY ANALYSIS OF VARIANCE
One way analysis of variance is used when you want to compare the
means of more than two groups. This test can be used only if the
background assumptions are satisfied such that it has independent
random samples, population are normal and population variance are
equal.
F=
ππ π΅
ππ π€
ππ π΅ = π
π=1
π
(π¦1 β π¦)2
πππ ππ π€ =
π
π
π
π=1
π
(π¦ππ β π¦π)2
Where: ππ
π΅=
ππ π΅
πβ1
ππ π€ =
ππ π€
π β π
74. Summary Table for one way-Anova
Source Sum of Squares Degrees of
Freedom
Variance
Estimate
F ratio
Between ππ π΅ K-1 ππ
π΅=
ππ π΅
πβ1
Within ππ π€ N-K ππ π€
=
ππ π€
π β π
ππ π΅
ππ π€
Total πππ‘ = ππ π΅ + ππ π€ N-1
Where: ππ π΅ = π π=1
π
(π¦1 β π¦)2
πππ ππ π€ = π
π
1 π=1
π
(π¦ππ β π¦π)2
75. Examples
A teacher is concerned about the level of knowledge possessed by PUP
students regarding Philippine history. Students completed a senior high
school level standardized history item. Academic major of the students
was also recorded. Data in terms of percent correct response is recorded
below for 24 hours. Is there a significant difference between the levels of
knowledge possessed by PUP students regarding Philippines history
grouped when grouped according to their academic major? Compute the
appropriate test for the data provided below and used 0.05 level of
significance.
80. Summary Table for one way-Anova
Source Sum of Squares Degrees of
Freedom
Variance
Estimate
F ratio
Between 748 3 748
3
= 249.33
Within 7685.3333 20 7685.333
20
= 384.27
ππ π΅
ππ π€
=
249.33
384.27
= 0.6489
Total πππ‘ = 8433.33 N-1
Where: ππ π΅ = π π=1
π
(π¦1 β π¦)2 πππ ππ π€ = π
π
1 π=1
π
(π¦ππ β π¦π)2
81. Computational Procedure
5. State Results (use t table to get the critical value)
π‘ β π£πππ’π = 3.86
Computed F-value =0.6489
0.6489 > 3.86, Decision:π΄πππππ‘ π» π
6. Conclusion: Therefore, there is no significant
difference between the levels of knowledge possessed
by PUP students regarding Philippine history when
grouped according to their academic subject.