SlideShare a Scribd company logo
1 of 46
CHAPTER
17:
ACCURACY
AND
ERRORReporter: SHELAMIE M. SANTILLAN-EDUC 243
student2nd Sem. S.Y. 2016-
When is a test score
inaccurate?
Almost
always.
All tests and
scores are
imperfect
and are
subject to
Error – What is it?
 No test measures perfectly,
and many tests fail to
measure as well as we
would like them to.
 Tests make “mistakes”.
They are always associated
with some degree of error.
Error – What is it?
 Think about the last test
you took.
 Did you obtain exactly the
score you thought or knew
you deserved?
Example of a type of error
that lower your obtained
score?
 When you couldn’t sleep the night
before the test
 When you are sick but took the test
anyway
 When the essay test you were taking
was so poorly constructed it was hard to
tell what was being tested.
Example of a type of error
that lower your obtained
score?
 When the test had a 45-minute time
limit but you were allowed only 38
minutes,
 When you took a test that had
multiple defensible answers
Example of a type error (of
situation) that raised your obtained
score?
 The time you just happened to see
the answers on your neighbor’s
paper,
 The time you got lucky guessing,
 The time you had 52 minutes for a
45-minute test
Example of a type error (of
situation) that raised your obtained
score?
 The time the test was so full of
unintentional clues that you were
able to answer several questions
based on the information given
in other question.
Then how does one go about
discovering one’s true score?
Unfortunately, we don’t have an
answer. The true score and the
error score are both theoretical or
hypothetical values.
Why bother with the true score or
error score?
Because they allow us to
illustrate some important
points about test score
reliability and test score
Simply keep the mind!
Remember:
Obtained Score = true score+ error
score
Table 17.1 The relationship among Obtained Scores,
Hypothetical True Scores, and Hypothetical Error Score for a
Ninth-Grade Math Test
Student Obtained
Score
True Score Error Score
Donna 91 88 +3
Jack 72 79 -7
Phyllis 68 70 -2
Gary 85 80 +5
Marsha 90 86 +4
Hypothetical Values
We will use the error scores from
table 17.1 (3, -7, -2, 5,4, -3)
Is the standard deviation of error scores of
a test.
The Standard
Error of
Measurement
(abbreviated S )
m
Step 1: Determine the
mean.
M = X = 0 = 0
Student Obtain
ed
Score
True
Score
Error
Score
Donna 91 88 +3
Jack 72 79 -7
Phyllis 68 70 -2
Gary 85 80 +5
Marsha 90 86 +4
Milton 75 78 -3
∑
N 6
Student Obtaine
d Score
True
Score
Error
Score
Donna 91 88 +3
Jack 72 79 -7
Phyllis 68 70 -2
Gary 85 80 +5
Marsha 90 86 +4
Milton 75 78 -3
Step 2: Subtract the mean from each error score to
arrive at the deviation scores. Square each deviation
score and sum the squared deviations.
X – M = x x
+3 – 0 = 3 9
-7– 0 = -7 49
-2 – 0 = -2 4
+5 – 0 = 5 25
+4 – 0 = 4 16
-3 – 0 = -3 9
2
∑X =
2
112
Step 3: Plug the x sum into the formula
and solve for the standard deviation.
2
Error Score SD =
Fortunately, a rather simple statistical formula can be
used to estimate this standard deviation (Sm) without
actually knowing the error scores:
Where r is the reliability of the test
and SD is the test’s standard
deviation.
USING THE STANDARD ERROR
OF MEASUREMENT
In summary, then, we know that error
scores:
1. are normally distributed
2. have a mean of zero
3. have a standard deviation called the
standard error of measurement
USING THE STANDARD ERROR OF
MEASUREMENT
Studen
t
Obtained
Score
True
Scor
e
Error
Scor
e
Donna 91 88 +3
Jack 72 79 -7
Phyllis 68 70 -2
Gary 85 80 +5
Marsh
a
90 86 +4
Milton 75 78 -3
Figure 17.1 The error score
distribution
Table 17.1
This figure tells us that the distribution
of error scores is a normal distribution
Figure 17.2 The error score distribution for the test depicted
Error score of the ninth-grade
math test
Fig. 17.3 The error score distribution for
the test depicted in Table 17.1
With approximate normal curve
percentages.
Let’s use the following number line to represent an
individual’s obtained score, which we will simply call
the X:
Fig. 17.4 The error distribution around an
obtained score of 90 for a test with Sm=
4.32
Student Obtained
Score
True
Scor
e
Error
Score
Donna 91 88 +3
Jack 72 79 -7
Phyllis 68 70 -2
Gary 85 80 +5
Marsh
a
90 86 +4
Milton 75 78 -3
Fig. 17.5 The error distribution around an
obtained score of 75 for a test with Sm =
4.32
Student Obtained
Score
True
Scor
e
Error
Score
Donna 91 88 +3
Jack 72 79 -7
Phyllis 68 70 -2
Gary 85 80 +5
Marsh
a
90 86 +4
Milton 75 78 -3
Standard Deviation or Standard error of
measurement?
Standard Deviation
(SD)
Standard Error of
Measurement
(Sm)
 Is the variability of raw
scores.
 It tells us how spread out
the scores are in a
distribution of raw scores.
 Is based on a group of
 Is the variability of
error scores.
 Is based on a group
of scores that is
hypothetical.
Why all the fuss about error?
For two reasons:
1.We want to make you aware
of the fallibility of test
scores.
2.We want to sensitize you
Classification of sources of
error
1. Test Takers.
2. The test itself.
3. Test administration.
4. Test scoring.
Test Takers:
Factors that would likely result in an
obtained score lower than a student’s true
score:
• fatigue and illness
• Accidentally seeing another
The test itself:
 Trick questions
 Reading level that is too
high.
 Ambiguous questions.
 Items that are too difficult.
Test Administration:
 Physical Comfort
 Instructions &
Explanations
 Test administrator
Attitudes
Error in Scoring:
 When computer scoring is
used, error can occur.
 When test are hand scored,
the likelihood of error
increases greatly.
Sources of Error Influencing
Various Reliability Coefficients
 Test-Retest
 Alternate Forms
 Internal
Consistency
Test- Retest
 Short-interval test-retest coefficients are
not likely to be affected greatly by within-
student error.
 Any problem that do exist in the test are
present both the first and second
administrations, affecting scores the same
way each time the test is administered.
Alternate Form
 Since alternate-forms reliability is
determined by administering two different
forms or versions of the same test to the
same group close together in time, the
effects within student error are negligible.
Alternate Form
 Error within the test, however, has a
significant effect on alternate-forms
reliability.
 As with test-retest method, alternate-
forms score reliability is not greatly
affected by error in administering or
scoring the test, as long as similar
Alternate Form
Internal
Consistency
BAND
INTERPRETATION
 uses the standard error
of measurement to a
more realistic
interpretation and report
groups of test scores.
BAND
INTERPRETATION
BAND
INTERPRETATION
Formula to compute the reliability of the
difference score is as follows:
BAND
INTERPRETATION
Step 1: List Data (let’s assume)
M: 100 , SD: 10, Score reliability - .91 for all
subtests.
Here are the
subtest scores
for John:
BAND
INTERPRETATION
Step 2: Determine Sm (standard error of
measurement)
Since SD and r are the same for each
subtest in this example, the standard error
of measurement will be the same for each
student.
BAND
INTERPRETATION
Step 2: Add and Subtract Sm
BAND
INTERPRETATION
Step 3: Graph the Results
Shade in the
bands to
represent the
range of scores
that has 68%
chance of
capturing John’s
BAND
INTERPRETATION
Step 4: Interpret the Bands
• Interpret the profile of bands by visually
inspecting the bars to see which bands
overlap and which do not.
• Those that overlap probably represent
differences that likely occurred by chance.
Final Word:
 Technically, there are more accurate statistical
procedures for determining real differences between an
individual’s test scores than the ones we have been able
to present here. These procedures, however, are time-
consuming, complex, and overly specific for the typical
teacher.
 Within the classroom, band interpretation, properly used,
makes for a practical alternative to those more advanced

More Related Content

What's hot

One Sample T Test
One Sample T TestOne Sample T Test
One Sample T Test
shoffma5
 
Estimating standard error of measurement
Estimating standard error of measurementEstimating standard error of measurement
Estimating standard error of measurement
Carlo Magno
 
Hypothesis Test Selection Guide
Hypothesis Test Selection GuideHypothesis Test Selection Guide
Hypothesis Test Selection Guide
Leanleaders.org
 
G6 m1-a-lesson 8-s
G6 m1-a-lesson 8-sG6 m1-a-lesson 8-s
G6 m1-a-lesson 8-s
mlabuski
 
Lesson03_static11
Lesson03_static11Lesson03_static11
Lesson03_static11
thangv
 

What's hot (20)

One Sample T Test
One Sample T TestOne Sample T Test
One Sample T Test
 
Math 221 week 6 live lecture
Math 221 week 6 live lectureMath 221 week 6 live lecture
Math 221 week 6 live lecture
 
Application of Statistical and mathematical equations in Chemistry Part 2
Application of Statistical and mathematical equations in Chemistry Part 2Application of Statistical and mathematical equations in Chemistry Part 2
Application of Statistical and mathematical equations in Chemistry Part 2
 
StatVignette06_HypTesting.pptx
StatVignette06_HypTesting.pptxStatVignette06_HypTesting.pptx
StatVignette06_HypTesting.pptx
 
Chi Squared
Chi SquaredChi Squared
Chi Squared
 
Division algorithms (2)
Division algorithms (2)Division algorithms (2)
Division algorithms (2)
 
Estimating standard error of measurement
Estimating standard error of measurementEstimating standard error of measurement
Estimating standard error of measurement
 
Math 221 week 1 lecture nov 2012 with help
Math 221 week 1 lecture nov 2012 with helpMath 221 week 1 lecture nov 2012 with help
Math 221 week 1 lecture nov 2012 with help
 
P1 Stroop
P1 StroopP1 Stroop
P1 Stroop
 
Point estimate for a population proportion p
Point estimate for a population proportion pPoint estimate for a population proportion p
Point estimate for a population proportion p
 
Hypothesis Test Selection Guide
Hypothesis Test Selection GuideHypothesis Test Selection Guide
Hypothesis Test Selection Guide
 
Stats Final
Stats FinalStats Final
Stats Final
 
G6 m1-a-lesson 8-s
G6 m1-a-lesson 8-sG6 m1-a-lesson 8-s
G6 m1-a-lesson 8-s
 
T test
T testT test
T test
 
Introduction to Business Analytics Course Part 10
Introduction to Business Analytics Course Part 10Introduction to Business Analytics Course Part 10
Introduction to Business Analytics Course Part 10
 
Lesson03_static11
Lesson03_static11Lesson03_static11
Lesson03_static11
 
Statistical inference: Statistical Power, ANOVA, and Post Hoc tests
Statistical inference: Statistical Power, ANOVA, and Post Hoc testsStatistical inference: Statistical Power, ANOVA, and Post Hoc tests
Statistical inference: Statistical Power, ANOVA, and Post Hoc tests
 
Some study materials
Some study materialsSome study materials
Some study materials
 
Paired t Test
Paired t TestPaired t Test
Paired t Test
 
Introduction to Business Analytics Course Part 9
Introduction to Business Analytics Course Part 9Introduction to Business Analytics Course Part 9
Introduction to Business Analytics Course Part 9
 

Similar to Accuracy and errors

Topic 8a Basic Statistics
Topic 8a Basic StatisticsTopic 8a Basic Statistics
Topic 8a Basic Statistics
Yee Bee Choo
 
Math 009 Final Examination Spring, 2015 1 Answer Sheet M.docx
Math 009 Final Examination Spring, 2015 1 Answer Sheet M.docxMath 009 Final Examination Spring, 2015 1 Answer Sheet M.docx
Math 009 Final Examination Spring, 2015 1 Answer Sheet M.docx
andreecapon
 
The statement that a person who scores 120 has twice as much of the .docx
The statement that a person who scores 120 has twice as much of the .docxThe statement that a person who scores 120 has twice as much of the .docx
The statement that a person who scores 120 has twice as much of the .docx
lourapoupheq
 
The statement that a person who scores 120 has twice as much of .docx
The statement that a person who scores 120 has twice as much of .docxThe statement that a person who scores 120 has twice as much of .docx
The statement that a person who scores 120 has twice as much of .docx
christalgrieg
 
Standard error of measurement
Standard error of measurementStandard error of measurement
Standard error of measurement
tlcoffman
 
Standard error of measurement
Standard error of measurementStandard error of measurement
Standard error of measurement
tlcoffman
 
Download the presentation
Download the presentationDownload the presentation
Download the presentation
butest
 
Statistics question and answers with mcqs
Statistics question and answers with mcqsStatistics question and answers with mcqs
Statistics question and answers with mcqs
NandiniYadav69
 
Assessment 3 – Hypothesis, Effect Size, Power, and t Tests.docx
Assessment 3 – Hypothesis, Effect Size, Power, and t Tests.docxAssessment 3 – Hypothesis, Effect Size, Power, and t Tests.docx
Assessment 3 – Hypothesis, Effect Size, Power, and t Tests.docx
cargillfilberto
 

Similar to Accuracy and errors (20)

Accuracy & Error
Accuracy & ErrorAccuracy & Error
Accuracy & Error
 
Module 3 statistics
Module 3   statisticsModule 3   statistics
Module 3 statistics
 
Topic 8a Basic Statistics
Topic 8a Basic StatisticsTopic 8a Basic Statistics
Topic 8a Basic Statistics
 
Math 009 Final Examination Spring, 2015 1 Answer Sheet M.docx
Math 009 Final Examination Spring, 2015 1 Answer Sheet M.docxMath 009 Final Examination Spring, 2015 1 Answer Sheet M.docx
Math 009 Final Examination Spring, 2015 1 Answer Sheet M.docx
 
Solution manual for design and analysis of experiments 9th edition douglas ...
Solution manual for design and analysis of experiments 9th edition   douglas ...Solution manual for design and analysis of experiments 9th edition   douglas ...
Solution manual for design and analysis of experiments 9th edition douglas ...
 
Data meeting
Data meeting Data meeting
Data meeting
 
VCE Physics: Dealing with numerical measurments
VCE Physics: Dealing with numerical measurmentsVCE Physics: Dealing with numerical measurments
VCE Physics: Dealing with numerical measurments
 
Chapter 3.pptx
Chapter 3.pptxChapter 3.pptx
Chapter 3.pptx
 
ANSWERS
ANSWERSANSWERS
ANSWERS
 
The statement that a person who scores 120 has twice as much of the .docx
The statement that a person who scores 120 has twice as much of the .docxThe statement that a person who scores 120 has twice as much of the .docx
The statement that a person who scores 120 has twice as much of the .docx
 
Jwan kareem.biostatic exercise
Jwan kareem.biostatic exerciseJwan kareem.biostatic exercise
Jwan kareem.biostatic exercise
 
The statement that a person who scores 120 has twice as much of .docx
The statement that a person who scores 120 has twice as much of .docxThe statement that a person who scores 120 has twice as much of .docx
The statement that a person who scores 120 has twice as much of .docx
 
Standard error of measurement
Standard error of measurementStandard error of measurement
Standard error of measurement
 
Standard error of measurement
Standard error of measurementStandard error of measurement
Standard error of measurement
 
Download the presentation
Download the presentationDownload the presentation
Download the presentation
 
QT1 - 07 - Estimation
QT1 - 07 - EstimationQT1 - 07 - Estimation
QT1 - 07 - Estimation
 
5. testing differences
5. testing differences5. testing differences
5. testing differences
 
Statistics question and answers with mcqs
Statistics question and answers with mcqsStatistics question and answers with mcqs
Statistics question and answers with mcqs
 
Assessment 3 – Hypothesis, Effect Size, Power, and t Tests.docx
Assessment 3 – Hypothesis, Effect Size, Power, and t Tests.docxAssessment 3 – Hypothesis, Effect Size, Power, and t Tests.docx
Assessment 3 – Hypothesis, Effect Size, Power, and t Tests.docx
 
T test
T test T test
T test
 

Recently uploaded

The basics of sentences session 3pptx.pptx
The basics of sentences session 3pptx.pptxThe basics of sentences session 3pptx.pptx
The basics of sentences session 3pptx.pptx
heathfieldcps1
 
Activity 01 - Artificial Culture (1).pdf
Activity 01 - Artificial Culture (1).pdfActivity 01 - Artificial Culture (1).pdf
Activity 01 - Artificial Culture (1).pdf
ciinovamais
 
Salient Features of India constitution especially power and functions
Salient Features of India constitution especially power and functionsSalient Features of India constitution especially power and functions
Salient Features of India constitution especially power and functions
KarakKing
 
1029-Danh muc Sach Giao Khoa khoi 6.pdf
1029-Danh muc Sach Giao Khoa khoi  6.pdf1029-Danh muc Sach Giao Khoa khoi  6.pdf
1029-Danh muc Sach Giao Khoa khoi 6.pdf
QucHHunhnh
 

Recently uploaded (20)

The basics of sentences session 3pptx.pptx
The basics of sentences session 3pptx.pptxThe basics of sentences session 3pptx.pptx
The basics of sentences session 3pptx.pptx
 
Activity 01 - Artificial Culture (1).pdf
Activity 01 - Artificial Culture (1).pdfActivity 01 - Artificial Culture (1).pdf
Activity 01 - Artificial Culture (1).pdf
 
Key note speaker Neum_Admir Softic_ENG.pdf
Key note speaker Neum_Admir Softic_ENG.pdfKey note speaker Neum_Admir Softic_ENG.pdf
Key note speaker Neum_Admir Softic_ENG.pdf
 
Unit-V; Pricing (Pharma Marketing Management).pptx
Unit-V; Pricing (Pharma Marketing Management).pptxUnit-V; Pricing (Pharma Marketing Management).pptx
Unit-V; Pricing (Pharma Marketing Management).pptx
 
Salient Features of India constitution especially power and functions
Salient Features of India constitution especially power and functionsSalient Features of India constitution especially power and functions
Salient Features of India constitution especially power and functions
 
Mixin Classes in Odoo 17 How to Extend Models Using Mixin Classes
Mixin Classes in Odoo 17  How to Extend Models Using Mixin ClassesMixin Classes in Odoo 17  How to Extend Models Using Mixin Classes
Mixin Classes in Odoo 17 How to Extend Models Using Mixin Classes
 
TỔNG ÔN TẬP THI VÀO LỚP 10 MÔN TIẾNG ANH NĂM HỌC 2023 - 2024 CÓ ĐÁP ÁN (NGỮ Â...
TỔNG ÔN TẬP THI VÀO LỚP 10 MÔN TIẾNG ANH NĂM HỌC 2023 - 2024 CÓ ĐÁP ÁN (NGỮ Â...TỔNG ÔN TẬP THI VÀO LỚP 10 MÔN TIẾNG ANH NĂM HỌC 2023 - 2024 CÓ ĐÁP ÁN (NGỮ Â...
TỔNG ÔN TẬP THI VÀO LỚP 10 MÔN TIẾNG ANH NĂM HỌC 2023 - 2024 CÓ ĐÁP ÁN (NGỮ Â...
 
1029-Danh muc Sach Giao Khoa khoi 6.pdf
1029-Danh muc Sach Giao Khoa khoi  6.pdf1029-Danh muc Sach Giao Khoa khoi  6.pdf
1029-Danh muc Sach Giao Khoa khoi 6.pdf
 
How to Create and Manage Wizard in Odoo 17
How to Create and Manage Wizard in Odoo 17How to Create and Manage Wizard in Odoo 17
How to Create and Manage Wizard in Odoo 17
 
SOC 101 Demonstration of Learning Presentation
SOC 101 Demonstration of Learning PresentationSOC 101 Demonstration of Learning Presentation
SOC 101 Demonstration of Learning Presentation
 
General Principles of Intellectual Property: Concepts of Intellectual Proper...
General Principles of Intellectual Property: Concepts of Intellectual  Proper...General Principles of Intellectual Property: Concepts of Intellectual  Proper...
General Principles of Intellectual Property: Concepts of Intellectual Proper...
 
2024-NATIONAL-LEARNING-CAMP-AND-OTHER.pptx
2024-NATIONAL-LEARNING-CAMP-AND-OTHER.pptx2024-NATIONAL-LEARNING-CAMP-AND-OTHER.pptx
2024-NATIONAL-LEARNING-CAMP-AND-OTHER.pptx
 
Application orientated numerical on hev.ppt
Application orientated numerical on hev.pptApplication orientated numerical on hev.ppt
Application orientated numerical on hev.ppt
 
ICT role in 21st century education and it's challenges.
ICT role in 21st century education and it's challenges.ICT role in 21st century education and it's challenges.
ICT role in 21st century education and it's challenges.
 
Basic Civil Engineering first year Notes- Chapter 4 Building.pptx
Basic Civil Engineering first year Notes- Chapter 4 Building.pptxBasic Civil Engineering first year Notes- Chapter 4 Building.pptx
Basic Civil Engineering first year Notes- Chapter 4 Building.pptx
 
Python Notes for mca i year students osmania university.docx
Python Notes for mca i year students osmania university.docxPython Notes for mca i year students osmania university.docx
Python Notes for mca i year students osmania university.docx
 
Fostering Friendships - Enhancing Social Bonds in the Classroom
Fostering Friendships - Enhancing Social Bonds  in the ClassroomFostering Friendships - Enhancing Social Bonds  in the Classroom
Fostering Friendships - Enhancing Social Bonds in the Classroom
 
Micro-Scholarship, What it is, How can it help me.pdf
Micro-Scholarship, What it is, How can it help me.pdfMicro-Scholarship, What it is, How can it help me.pdf
Micro-Scholarship, What it is, How can it help me.pdf
 
UGC NET Paper 1 Mathematical Reasoning & Aptitude.pdf
UGC NET Paper 1 Mathematical Reasoning & Aptitude.pdfUGC NET Paper 1 Mathematical Reasoning & Aptitude.pdf
UGC NET Paper 1 Mathematical Reasoning & Aptitude.pdf
 
Spatium Project Simulation student brief
Spatium Project Simulation student briefSpatium Project Simulation student brief
Spatium Project Simulation student brief
 

Accuracy and errors

  • 1. CHAPTER 17: ACCURACY AND ERRORReporter: SHELAMIE M. SANTILLAN-EDUC 243 student2nd Sem. S.Y. 2016-
  • 2. When is a test score inaccurate? Almost always. All tests and scores are imperfect and are subject to
  • 3. Error – What is it?  No test measures perfectly, and many tests fail to measure as well as we would like them to.  Tests make “mistakes”. They are always associated with some degree of error.
  • 4. Error – What is it?  Think about the last test you took.  Did you obtain exactly the score you thought or knew you deserved?
  • 5. Example of a type of error that lower your obtained score?  When you couldn’t sleep the night before the test  When you are sick but took the test anyway  When the essay test you were taking was so poorly constructed it was hard to tell what was being tested.
  • 6. Example of a type of error that lower your obtained score?  When the test had a 45-minute time limit but you were allowed only 38 minutes,  When you took a test that had multiple defensible answers
  • 7. Example of a type error (of situation) that raised your obtained score?  The time you just happened to see the answers on your neighbor’s paper,  The time you got lucky guessing,  The time you had 52 minutes for a 45-minute test
  • 8. Example of a type error (of situation) that raised your obtained score?  The time the test was so full of unintentional clues that you were able to answer several questions based on the information given in other question.
  • 9. Then how does one go about discovering one’s true score? Unfortunately, we don’t have an answer. The true score and the error score are both theoretical or hypothetical values.
  • 10. Why bother with the true score or error score? Because they allow us to illustrate some important points about test score reliability and test score
  • 11. Simply keep the mind! Remember: Obtained Score = true score+ error score
  • 12. Table 17.1 The relationship among Obtained Scores, Hypothetical True Scores, and Hypothetical Error Score for a Ninth-Grade Math Test Student Obtained Score True Score Error Score Donna 91 88 +3 Jack 72 79 -7 Phyllis 68 70 -2 Gary 85 80 +5 Marsha 90 86 +4 Hypothetical Values
  • 13. We will use the error scores from table 17.1 (3, -7, -2, 5,4, -3) Is the standard deviation of error scores of a test. The Standard Error of Measurement (abbreviated S ) m
  • 14. Step 1: Determine the mean. M = X = 0 = 0 Student Obtain ed Score True Score Error Score Donna 91 88 +3 Jack 72 79 -7 Phyllis 68 70 -2 Gary 85 80 +5 Marsha 90 86 +4 Milton 75 78 -3 ∑ N 6
  • 15. Student Obtaine d Score True Score Error Score Donna 91 88 +3 Jack 72 79 -7 Phyllis 68 70 -2 Gary 85 80 +5 Marsha 90 86 +4 Milton 75 78 -3 Step 2: Subtract the mean from each error score to arrive at the deviation scores. Square each deviation score and sum the squared deviations. X – M = x x +3 – 0 = 3 9 -7– 0 = -7 49 -2 – 0 = -2 4 +5 – 0 = 5 25 +4 – 0 = 4 16 -3 – 0 = -3 9 2 ∑X = 2 112
  • 16. Step 3: Plug the x sum into the formula and solve for the standard deviation. 2 Error Score SD =
  • 17. Fortunately, a rather simple statistical formula can be used to estimate this standard deviation (Sm) without actually knowing the error scores: Where r is the reliability of the test and SD is the test’s standard deviation.
  • 18. USING THE STANDARD ERROR OF MEASUREMENT In summary, then, we know that error scores: 1. are normally distributed 2. have a mean of zero 3. have a standard deviation called the standard error of measurement
  • 19. USING THE STANDARD ERROR OF MEASUREMENT Studen t Obtained Score True Scor e Error Scor e Donna 91 88 +3 Jack 72 79 -7 Phyllis 68 70 -2 Gary 85 80 +5 Marsh a 90 86 +4 Milton 75 78 -3 Figure 17.1 The error score distribution Table 17.1
  • 20. This figure tells us that the distribution of error scores is a normal distribution Figure 17.2 The error score distribution for the test depicted Error score of the ninth-grade math test
  • 21. Fig. 17.3 The error score distribution for the test depicted in Table 17.1 With approximate normal curve percentages.
  • 22. Let’s use the following number line to represent an individual’s obtained score, which we will simply call the X:
  • 23. Fig. 17.4 The error distribution around an obtained score of 90 for a test with Sm= 4.32 Student Obtained Score True Scor e Error Score Donna 91 88 +3 Jack 72 79 -7 Phyllis 68 70 -2 Gary 85 80 +5 Marsh a 90 86 +4 Milton 75 78 -3
  • 24. Fig. 17.5 The error distribution around an obtained score of 75 for a test with Sm = 4.32 Student Obtained Score True Scor e Error Score Donna 91 88 +3 Jack 72 79 -7 Phyllis 68 70 -2 Gary 85 80 +5 Marsh a 90 86 +4 Milton 75 78 -3
  • 25. Standard Deviation or Standard error of measurement? Standard Deviation (SD) Standard Error of Measurement (Sm)  Is the variability of raw scores.  It tells us how spread out the scores are in a distribution of raw scores.  Is based on a group of  Is the variability of error scores.  Is based on a group of scores that is hypothetical.
  • 26. Why all the fuss about error? For two reasons: 1.We want to make you aware of the fallibility of test scores. 2.We want to sensitize you
  • 27. Classification of sources of error 1. Test Takers. 2. The test itself. 3. Test administration. 4. Test scoring.
  • 28. Test Takers: Factors that would likely result in an obtained score lower than a student’s true score: • fatigue and illness • Accidentally seeing another
  • 29. The test itself:  Trick questions  Reading level that is too high.  Ambiguous questions.  Items that are too difficult.
  • 30. Test Administration:  Physical Comfort  Instructions & Explanations  Test administrator Attitudes
  • 31. Error in Scoring:  When computer scoring is used, error can occur.  When test are hand scored, the likelihood of error increases greatly.
  • 32. Sources of Error Influencing Various Reliability Coefficients  Test-Retest  Alternate Forms  Internal Consistency
  • 33. Test- Retest  Short-interval test-retest coefficients are not likely to be affected greatly by within- student error.  Any problem that do exist in the test are present both the first and second administrations, affecting scores the same way each time the test is administered.
  • 34. Alternate Form  Since alternate-forms reliability is determined by administering two different forms or versions of the same test to the same group close together in time, the effects within student error are negligible.
  • 35. Alternate Form  Error within the test, however, has a significant effect on alternate-forms reliability.  As with test-retest method, alternate- forms score reliability is not greatly affected by error in administering or scoring the test, as long as similar
  • 38. BAND INTERPRETATION  uses the standard error of measurement to a more realistic interpretation and report groups of test scores.
  • 40. BAND INTERPRETATION Formula to compute the reliability of the difference score is as follows:
  • 41. BAND INTERPRETATION Step 1: List Data (let’s assume) M: 100 , SD: 10, Score reliability - .91 for all subtests. Here are the subtest scores for John:
  • 42. BAND INTERPRETATION Step 2: Determine Sm (standard error of measurement) Since SD and r are the same for each subtest in this example, the standard error of measurement will be the same for each student.
  • 44. BAND INTERPRETATION Step 3: Graph the Results Shade in the bands to represent the range of scores that has 68% chance of capturing John’s
  • 45. BAND INTERPRETATION Step 4: Interpret the Bands • Interpret the profile of bands by visually inspecting the bars to see which bands overlap and which do not. • Those that overlap probably represent differences that likely occurred by chance.
  • 46. Final Word:  Technically, there are more accurate statistical procedures for determining real differences between an individual’s test scores than the ones we have been able to present here. These procedures, however, are time- consuming, complex, and overly specific for the typical teacher.  Within the classroom, band interpretation, properly used, makes for a practical alternative to those more advanced

Editor's Notes

  1. Think about the last test you took. Did you obtain exactly the score you thought or knew you deserved? Was your score higher than you expected? Was it lower than you expected? What about your obtained scores on all the other tests you have taken? Did they truly reflect your skill, knowledge, or ability, or did they sometimes underestimate your knowledge, ability, or skill? Or did they overestimate? If your obtained test scores did not always reflect your true ability, they were associated with some error.
  2. Your obtained scores may have been lower or higher than they should have been. In short, an obtained score has a true component (actual level of ability, knowledge) and an error component (which may act to lower or raise the obtained score).
  3. We never actually know an individual’s true score or error score.
  4. They are important concepts because they allow us to illustrate some important points about test score reliability and test score accuracy.
  5. The standard deviation of the error score distribution, also known as the standard error of measurement, is 4. 43. If we could know what the error scores are for each test we administer, we could compute Sm in this manner. But, of course, we never know these error scores. If you are following so far, your neat question should be, “But how in the world do you determine the standard deviation of the error scores if you never know the error scores?”
  6. Error scores are assumed to be random. As such, they cancel each other out. That is obtained scores are inflated by random error to the same extent as they are deflated by error. Another way of saying this is that the mean of the error scores for a test is zero. The distribution of the error scores is also important, since it approximates a normal distribution closely enough for us to use the normal distribution to represent it.
  7. Returning to our example from the ninth-grade math test in Table 17.1, we recall that we obtained an Sm of 4.32 for the data provided.
  8. Figure 17.2 illustrates the distribution of error scores for these data. What does the distribution in Fig. 17.2 tell us? Before you answer, consider this: The distribution error of scores is a normal distribution. This is important since, as you learned in Chapter 13, the normal distribution has characteristics that enable us to make decisions about scores that fall between, above, or below different points in the distribution. We are able to do so because fixed percentages of scores fall between various score values in a normal distribution.
  9. (Fig. 17.3 Should refresh your memory) We listed along the baseline the standard deviation of the error score distribution. This is more commonly called the standard error of measurement (Sm) of the test. Thus we can see that 68% of the error scores for the test will be no more than 4.32 points higher or 4.32 points lower than the true scores. That is, if there were 100 obtained scores on this test, 68 of these scores would not be “off” their true scores by more than 4.32 points. The Sm then, tells us about the distribution of obtained score around true scores. By knowing an individual’s true socre we can predict what his or her obtained score is likely to be.
  10. The careful reader may be thinking, “That’s not very useful information. We cab never know what a person’s true score is, only their obtained score.” This is correct. As a test users, we work only with obtained scores. However, we can follow our logic in reverse. If 68% of obtained scores fall within 1 Sm of their true scores, then 68% of true scores must fall within 1Sm of their obtained scores. Strictly speaking, this reverse logic is somewhat inaccurate, it would be true 99% of the itme (Gullikson, 1987). Therefore the Sm is often used to determine how test error is likely to have affected individual obtained scores. That is, X plus or minus 4.32 (+4.32) defines the range or band
  11. Why all fuss? Remember our original point. All test scores are fallible (tending to err); they contain a margin of error. The Sm is a statistic that estimates margin for us. We are accustomed to reporting a single test score. In education, we have long had a tendency to overinterpret small differences in test scores since we too often consider obtained scores to be completely sccurate. Incorporating the Sm in reporting test scores greatly minimizes the likelihood of overinterpretation and forces us to consider how fallible our test scores are. After considering the Sm from a slightly different angle, we will show how to incorporate it to make comparisons among test scores. This procedure is called band interpretation.
  12. You learned to compute and interpret SD in chapter 13.
  13. In reality, an individual’s obtained score is the best estimate of an individual’s true score. That is, inspite of the foregoing discussion, we usually use the obtained score as our best guest of a student’s true level of ability. Well, why all the fuss about error then?
  14. Generally, error due to within-student factors is beyond our control.
  15. Physical Comfort: room temperature, humidity, lighting, noise, and seating arrangement are all potential sources of error for the test taker. Instructions and Explanations: Different test administrators provide differing amounts of information to test takers. Some spell words, provide hints, or tell whether it’s better to guess or leave blanks, while others remain fairly distant. Naturally, your score may vary depending to the amount of information you are provided. Test Administrator Attitudes: Administrators differ in the notions they convey about the importance of the test, the extent to which they are emotionally supportive of students, and the way in which they monitor the test. To the extent that these variables affect students differently, test score reliability and accuracy will be impaired.
  16. The computer a highly reliable machine, is seldom the cause of such errors. But teachers and other test administrators prepare the scoring keys, introducing possibilities for error. And sometimes fail to use No. 2 pencils or make extraneous marks on an answer sheets, introducing another potential source of scoring error. Needles to say, when tests are hand scored, as most classroom tests are, the likelihood of error increases greatly. In fact, because you are human, you can be sure that you will make some scoring errors in grading the tests you give.
  17. With test-retest and alternate-forms reliability, with-in-student factors affect the method of estimating score reliability, since changes in test performance due to such problems as fatigue, momentary anxiety, illness or just having an “off” day can be doubled because there two separate administrations of the test. If the test is sensitive to those problems, it will record different scores from one administration to another, lowering the reliability (or correlation coefficient) between them. Obviously, we would prefer that that the test not be affected by those problems. But if it is, we would like to know about it.
  18. List subtests and scores and the M, SD, and reliability (r) for each subject. For purpose of illustration, let’s assume that the mean is 100, the standard deviation is 10, and the score reliability is .91 for all the subtests.
  19. Since SD and r are the same for each subtest in this example, the standard error of measurement will be the same for each student.
  20. To identify the band or interval of scores that has 68% chance of capturing John’s true score, add and subtract Sm to each subtest score. If the test could be given to John 100 times (without John learning from taking the test), 68 out of 100 times John’s true score would be within the following bands:
  21. To identify the band or interval of scores that has 68% chance of capturing John’s true score, add and subtract Sm to each subtest score. If the test could be given to John 100 times (without John learning from taking the test), 68 out of 100 times John’s true score would be within the following bands: