Goodness of-fit

Elementary Statistics
Goodness-of-Fit
1

Goodness-of-Fit and Contingency Tables
Goodness-of-Fit
2
Objectives:
1 Test a distribution for goodness of fit, using chi-square.
2 Test two variables for independence, using chi-square.
3 Test proportions for homogeneity, using chi-square.

3
TI Calculator:
How to enter data:
1. Stat
2. Edit
3. ClrList 𝑳 𝟏 & 𝑳 𝟐
4. Type in your data in 𝑳 𝟏 & 𝑳 𝟐
Recall: Correlation & Regression
The linear correlation coefficient r, is a number that measures how well paired sample data fit a
straight-line pattern when graphed . The value of r² is the proportion of the variation in y that is
explained by the linear relationship between x and y. (The linear correlation coefficient:−1 ≤ r ≤ 1)
Regression Line:
Given a collection of paired sample
data, the regression line (or line of best
fit, or least-squares line) is the straight
line that “best” fits the scatterplot of the
data.
No significant linear correlation →
The best predicted y-value is 𝑦.
Significant linear correlation → The
best predicted y-value is found by
Plugging x-value into the regression
equation.
TI Calculator:
Scatter Plot:
1. Press on Y & clear
2. 2nd y, Enter
3. On, Enter
4. Select X1-list: 𝑳 𝟏
5. Select Y1-list: 𝑳 𝟐
6. Mark: Select
Character
7. Press Zoom & 9 to get
ZoomStat
zx denotes the z score for an individual sample value x
zy is the z score for the corresponding sample value y.
2
2
:
:
, 2
1
T
n
t r d
O
r
r
fS n
r

  

TI Calculator:
Linear Regression - test
1. Stat
2. Tests
3. LinRegTTest
4. Enter 𝑳 𝟏 & 𝑳 𝟐
5. Freq = 1
6. Choose ≠
7. Calculate
    
       
2 22 2
)
1
:
(
,
x y
n xy x y Z Z
r r
n
n x x n y y
Or

 
          
   
   
Test of Hypothesis:
Step 1: H0 :𝜌 = 0, H1: 𝜌 ≠ 0 claim
& Tails
Step 2: TS: 𝑡 = 𝑟
𝑛−2
1−𝑟2 , OR: r
Step 3: CV using α From the
T-table or Correlation (r-table)
Step 4: Make the decision to
a. Reject or not H0
b. The claim is true or false
c. Restate this decision: There is /
is not sufficient evidence to
support the claim that…
There is a linear Correlation If |r|
≥ critical value
There is No Correlation If |r| <
critical value
 
0 1
1 22
0 1
Regression Line:
,
:
int :
,
y b b x
n xy x y
Slope b
n x x
Y ercept b y b x
y x
y x
n n
 



  
 
  
 
 

4
Goodness-of-Fit
Key Concept:
By “goodness-of-fit” we mean that sample data consisting of observed frequency counts
arranged in a single row or column (called a one-way frequency table) agree with some
particular distribution (such as normal or uniform) being considered. We will use a hypothesis
test for the claim that the observed frequency counts agree with the claimed distribution.
Definition: A Multinomial Experiment is an experiment that meets the following conditions:
1. The number of trials is fixed.
2. The trials are independent.
3. All outcomes of each trial must be classified into exactly one of several different categories.
4. The probabilities for the different categories remain constant for each trial.
Goodness-of-Fit Test
A goodness-of-fit test is used to test the hypothesis that an observed frequency distribution fits (or
conforms to) some claimed distribution.
Conduct a goodness-of-fit test, which is a hypothesis test to determine whether a single row (or
column) of frequency counts agrees with some specific distribution (such as uniform or normal).

5
Goodness-of-Fit Notation
O represents the observed frequency of an outcome, found from the sample data.
E represents the expected frequency of an outcome, found by assuming that the distribution is as
claimed.
k represents the number of different categories or cells.
n represents the total number of trials (or the total of observed sample values).
p represents the probability that a sample value falls within a particular category.
Requirements:
1. The data have been randomly selected.
2. The sample data consist of frequency counts for each of the different categories.
3. For each category, the expected frequency is at least 5. (The expected frequency for a category is
the frequency that would occur if the data actually have the distribution that is being claimed.
There is no requirement that the observed frequency for each category must be at least 5.)
Null and Alternative Hypotheses
H0: The frequency counts agree with the claimed distribution.
H1: The frequency counts do not agree with the claimed distribution.

6
P-values: P-values are provided by technology, or a range of P-values can be
found from 𝜒2
− 𝑇𝑎𝑏𝑙𝑒.
Critical values:
1. Critical values are found in 𝜒2
− 𝑇𝑎𝑏𝑙𝑒 by using k − 1 degrees of freedom,
where k is the number of categories.
2. Goodness-of-fit hypothesis tests are always right-tailed.
3. Conducting a goodness-of-fit test requires that we identify the observed
frequencies denoted by O, then find the frequencies expected (denoted by
E) with the claimed distribution. There are two different approaches for
finding expected frequencies E:
If the expected frequencies are not all equal:
Calculate E = np for each individual category.
2
2
Test Statistic:
( )
1
O E
E
df k



 

d.f. = number of categories minus 1
O = observed frequency
E = expected frequency
𝑰𝒇 𝒆𝒙𝒑𝒆𝒄𝒕𝒆𝒅 𝒇𝒓𝒆𝒒𝒖𝒆𝒏𝒄𝒊𝒆𝒔 𝒂𝒓𝒆 𝒂𝒍𝒍 𝒆𝒒𝒖𝒂𝒍: 𝑬 =
𝒏
𝒌
𝐸𝑞𝑢𝑎𝑙 𝐹𝑟𝑒𝑞: 𝐸 =
𝑛
𝑘
, otherwise: E = np

Example 1:
7
Finding Expected Frequencies:
Equally Likely A single die is rolled 45 times with the following
results. Assuming that the die is fair and all outcomes are equally
likely, find the expected frequency E for each empty cell.
Outcome 1 2 3 4 5 6
Observed Frequency O 13 6 12 9 3 2
Expected Frequency E
𝐸 =
𝑛
𝑘
If the die is fair and the outcomes are all
equally likely, we expect that each
outcome should occur about 7.5 times.
=
45
6
= 7.5
b. Not Equally Likely Using the same results from part (a), suppose that we claim that instead
of being fair, the die is loaded so that the outcome of 1 occurs 50% of the time and the other
five outcomes occur 10% of the time. The probabilities are listed in the second row below.
Outcome 1 2 3 4 5 6
Probability 0.5 0.1 0.1 0.1 0.1 0.1
Observed Frequency O 13 6 12 9 3 2
Expected Frequency E
n = 45 & E = np
1: E = np = (45)(0.5) = 22.5
Each of the other five outcomes:
E = np = (45)(0.1) = 4.5
7.5 7.5 7.5 7.5 7.5 7.5
22.5 4.5 4.5 4.5 4.5 4.5
𝐸𝑞𝑢𝑎𝑙 𝐹𝑟𝑒𝑞: 𝐸 =
𝑛
𝑘
,
otherwise: E = np

8
Measuring Disagreement with the Claimed
Distribution
We know that sample frequencies typically differ somewhat from the values
we theoretically expect, so we consider the key question:
Are the differences between the actual observed frequencies O and the
theoretically expected frequencies E significant?
To measure the discrepancy between the O and E values, we use the test statistic
d.f. = Number of categories minus 1
O = observed frequency
E = expected frequency
2
2
Test Statistic:
( )
1
O E
E
df k



 


Example 2:
9
To check whether consumers have any preference among
five flavors of a new fruit soda, a sample of 100 people
provided the following data. Is there enough evidence to
support the claim that there is no preference in the
selection of fruit soda flavors? Let α = 0.05.
Cherry Strawberry Orange Lime Grape
Observed 32 28 16 14 10
Expected 20 20 20 20 20
H0: Consumers show no preference
(claim) OR ( p1 = p2 = p3 = p4 = p5 ).
H1: Consumers show a preference.(At
least one of the probabilities is different
from the others.), RTT
df = 5 – 1 = 4 & α = 0.05→
2
2 ( )
:
O E
TS
E


 
2 2 2 2 2
(32 20) (28 20) (16 20) (14 20) (10 20)
20 20 20 20 20
    
    
Decision:
a. Reject H0
b. The claim is False
c. There is NOT sufficient
evidence to support the
claim that consumers show
no preference for the flavors.
18.000
Step 1: H0 , H1, claim & Tails
Step 2: TS Calculate (TS)
Step 3: CV using α
a. Reject or not H0
c. Restate this decision: There is
/ is not sufficient evidence to
TI Calculator:
Goodness of Fit - test
1. Stat
2. Tests
3. 𝝌 𝟐
𝑮𝑶𝑭 − 𝑻𝒆𝒔𝒕
4. Enter 𝑳 𝟏 & 𝑳 𝟐
5. 𝐝𝐟 = 𝐧 − 𝟏
6. Calculate
TI Calculator: Enter data:
1. Stat
2. Edit
3. ClrList 𝑳 𝟏 & 𝑳 𝟐
4. O → 𝑳 𝟏 & E → 𝑳 𝟐
CV: 𝜒2 = 9.488
2
2 ( )
T 1: ,S
O E
df k
E


  

Example 3:
10
A researcher reads that firearm-related deaths for people aged 1 to 18
were distributed as follows: 74% were accidental, 16% were homicides,
and 10% were suicides. In her district, there were 68 accidental deaths,
27 homicides, and 5 suicides during the past year. At α = 0.10, test the
claim that the percentages are equal to the given claimed values.
Accidental Homicides Suicides
Observed 68 27 5
Expected 74 16 10
H0: p1 = 0.74, p2 = 0.16 and p3 = 0.10 (Claim)
H1: At least one of the proportions is not equal
to the given claimed value. (RTT)
2 2 2
(68 74) (27 16) (5 10)
74 16 10
  
   10.549
CV: α = 0.10, df = k – 1 = 2→
Decision:
a. Reject H0
b. The claim is False
c. There is not sufficient evidence to support the
claim that the distribution is 74% accidental,
16% homicides, and 10% suicides.
2
2 ( )
:
O E
TS
E


 
Step 1: H0 , H1, claim & Tails
Step 2: TS Calculate (TS)
Step 3: CV using α
a. Reject or not H0
c. Restate this decision: There is
/ is not sufficient evidence to
χ² = 4.605
2
2
Test Statistic:
( )
1
O E
E
df k



 


Goodness of-fit

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to Goodness of-fit

Similar to Goodness of-fit (20)

More from Long Beach City College

More from Long Beach City College (20)

Recently uploaded

Recently uploaded (20)

Goodness of-fit