SlideShare a Scribd company logo
1 of 13
Page | 1
SHELAMIE M. SANTILLAN – EDUC 243 Student S.Y. 2nd
Sem. 2016-2017
Graduate School
EDUC 243 – EDUCATIONAL EVALUATION
2nd Semester
School Year 2016-2017
Narrative Report
Chapter 17: Accuracy and Error
1. Error – What is it?
2. The Standard Error of Measurement
3. Standard Deviation or Standard Error of Measurement?
4. Why all the fuss about error?
Sources of error:
1. Test Takers
2. The Test Itself
3. Test Administration
4. Test Scoring
5. Sources of Error Influencing various Reliability Coefficients
1. Test-Retest
2. Alternate Forms
3. Internal Consistency
6. Band Interpretation
Submitted to:
DR. JAMES L. PAGLINAWAN
Submitted by:
SHELAMIE M. SANTILLAN
EDUC 243 Student
Page | 2
SHELAMIE M. SANTILLAN – EDUC 243 Student S.Y. 2nd
Sem. 2016-2017
When is a test score inaccurate?
Almost always.
All tests and scores are imperfect and are subject to error
Error – What is it?
 No test measures perfectly, and many tests fail to measure as well as we would like
them to.
 Tests make “mistakes”. They are always associated with some degree of error.
Think about the last test you took. Did you obtain exactly the score you thought or knew you
deserved? Was your score higher than you expected? Was it lower than you expected? What about
your obtained scores on all the other tests you have taken? Did they truly reflect your skill,
knowledge, or ability, or did they sometimes underestimate your knowledge, ability, or skill? Or did
they overestimate? If your obtained test scores did not always reflect your true ability, they were
associated with some error.
Your obtained scores may have been lower or higher than they should have been. In short, an
obtained score has a true component (actual level of ability, knowledge) and an error component
(which may act to lower or raise the obtained score).
Example of a type of error that lower your obtained score?
 When you couldn’t sleep the night before the test
 When you are sick but took the test anyway
 When the essay test you were taking was so poorly constructed it was hard to tell what
was being tested.
 When the test had a 45-minute time limit but you were allowed only 38 minutes,
 When you took a test that had multiple defensible answers
Example of a type error (of situation) that raised your obtained score?
 The time you just happened to see the answers on your neighbor’s paper,
 The time you got lucky guessing,
 The time you had 52 minutes for a 45-minute test
 The time the test was so full of unintentional clues that you were able to answer several
questions based on the information given in other question.
Then how does one go about discovering one’s true score?
Unfortunately, we don’t have an answer. The true score and the error score are both theoretical or
hypothetical values.
We never actually know an individual’s true score or error score.
Why bother with the true score or error score?
Because they allow us to illustrate some important points about test score reliability and test score
accuracy.
Page | 3
SHELAMIE M. SANTILLAN – EDUC 243 Student S.Y. 2nd
Sem. 2016-2017
Simply keep the mind!
Remember:
Obtained Score = true score+ error score
The Standard Error of Measurement
(abbreviated Sm )
 Is the standard deviation of error scores of a test.
 We will use the error scores from table 17.1 (3, -7, -2, 5,4, -3)
Step 1: Determine the mean.
Step 2: Subtract the mean from each error score to arrive at the deviation scores. Square each
deviation score and sum the squared deviations.
Page | 4
SHELAMIE M. SANTILLAN – EDUC 243 Student S.Y. 2nd
Sem. 2016-2017
Step 3: Plug the x sum into the formula and solve for the standard deviation.
The standard deviation of the error score distribution, also known as the standard error of
measurement, is 4. 43. If we could know what the error scores are for each test we administer, we
could compute Sm in this manner. But, of course, we never know these error scores. If you are
following so far, your neat question should be, “But how in the world do you determine the standard
deviation of the error scores if you never know the error scores?”
Fortunately, a rather simple statistical formula can be used to estimate this standard deviation
(Sm) without actually knowing the error scores:
Where r is the reliability of the test and SD is the test’s standard deviation.
USING THE STANDARD ERROR OF MEASUREMENT
Error scores are assumed to be random. As such, they cancel each other out. That is obtained scores
are inflated by random error to the same extent as they are deflated by error. Another way of saying
this is that the mean of the error scores for a test is zero. The distribution of the error scores is also
important, since it approximates a normal distribution closely enough for us to use the normal
distribution to represent it.
In summary, then, we know that error scores:
1. are normally distributed
2. have a mean of zero
3. have a standard deviation called the standard error of measurement (Sm). These
characteristics are illustrated in Fig. 17.1
Returning to our example from the ninth-grade math test in Table 17.1, we recall that we obtained an
Sm of 4.32 for the data provided.
Page | 5
SHELAMIE M. SANTILLAN – EDUC 243 Student S.Y. 2nd
Sem. 2016-2017
Figure 17.2 illustrates the distribution of error scores for these data. What does the distribution
in Fig. 17.2 tell us? Before you answer, consider this: The distribution error of scores is a normal
distribution. This is important since, as you learned in Chapter 13, the normal distribution has
characteristics that enable us to make decisions about scores that fall between, above, or below
different points in the distribution. We are able to do so because fixed percentages of scores fall
between various score values in a normal distribution.
This figure tells us that the distribution of error scores is a normal distribution
Fig. 17.3 The error score distribution for the test depicted in Table 17.1
Page | 6
SHELAMIE M. SANTILLAN – EDUC 243 Student S.Y. 2nd
Sem. 2016-2017
(Fig. 17.3 Should refresh your memory) We listed along the baseline the standard deviation of
the error score distribution. This is more commonly called the standard error of measurement (Sm) of
the test. Thus we can see that 68% of the error scores for the test will be no more than 4.32 points
higher or 4.32 points lower than the true scores. That is, if there were 100 obtained scores on this test,
68 of these scores would not be “off” their true scores by more than 4.32 points. The Sm then, tells us
about the distribution of obtained score around true scores. By knowing an individual’s true score we
can predict what his or her obtained score is likely to be.
The careful reader may be thinking, “That’s not very useful information. We can never know
what a person’s true score is, only their obtained score.” This is correct. As a test users, we work only
with obtained scores. However, we can follow our logic in reverse. If 68% of obtained scores fall within
1 Sm of their true scores, then 68% of true scores must fall within 1Sm of their obtained scores. Strictly
speaking, this reverse logic is somewhat inaccurate, it would be true 99% of the item (Gullikson, 1987).
Therefore the Sm is often used to determine how test error is likely to have affected individual
obtained scores.
That is, X plus or minus 4.32 (+4.32) defines the range or band.
Let’s use the following number line to represent an individual’s obtained score, which we will simply
call the X:
Fig. 17.4 The error distribution around an obtained score of 90 for a test with Sm= 4.32
Fig. 17.5 The error distribution around an obtained score of 75 for a test with Sm = 4.32
Page | 7
SHELAMIE M. SANTILLAN – EDUC 243 Student S.Y. 2nd
Sem. 2016-2017
Why all fuss? Remember our original point. All test scores are fallible (tending to err); they
contain a margin of error. The Sm is a statistic that estimates margin for us. We are accustomed to
reporting a single test score.
In education, we have long had a tendency to overinterpret small differences in test scores
since we too often consider obtained scores to be completely accurate. Incorporating the Sm in
reporting test scores greatly minimizes the likelihood of overinterpretation and forces us to consider
how fallible our test scores are. After considering the Sm from a slightly different angle, we will show
how to incorporate it to make comparisons among test scores. This procedure is called band
interpretation.
Standard Deviation or Standard error of measurement?
You learned to compute and interpret SD in chapter 13.
Why all the fuss about error?s
For two reasons:
1. We want to make you aware of the fallibility of test scores.
2. We want to sensitize you the factors that can affect scores
In reality, an individual’s obtained score is the best estimate of an individual’s true score. That is,
inspite of the foregoing discussion, we usually use the obtained score as our best guest of a student’s
true level of ability. Well, why all the fuss about error then?
Page | 8
SHELAMIE M. SANTILLAN – EDUC 243 Student S.Y. 2nd
Sem. 2016-2017
Classification of sources of error
1. Test Takers.
2. The test itself.
3. Test administration.
4. Test scoring.
Test Takers:
Factors that would likely result in an obtained score lower than a student’s true score:
• fatigue and illness
• Accidentally seeing another student’s answer
Generally, error due to within-student factors is beyond our control.
The test itself:
 Trick questions
 Reading level that is too high.
 Ambiguous questions.
 Items that are too difficult.
 Poorly written items.
Test Administration:
 Physical Comfort
(room temperature, humidity, lighting, noise, and seating arrangement are all potential sources
of error for the test taker.)
 Instructions & Explanations
(Different test administrators provide differing amounts of information to test takers. Some
spell words, provide hints, or tell whether it’s better to guess or leave blanks, while others
remain fairly distant. Naturally, your score may vary depending to the amount of information
you are provided.)
 Test administrator Attitudes
(Administrators differ in the notions they convey about the importance of the test, the extent
to which they are emotionally supportive of students, and the way in which they monitor the
test. To the extent that these variables affect students differently, test score reliability and
accuracy will be impaired.)
Error in Scoring:
 When computer scoring is used, error can occur.
 When test are hand scored, the likelihood of error increases greatly.
The computer a highly reliable machine, is seldom the cause of such errors. But teachers and other
test administrators prepare the scoring keys, introducing possibilities for error. And sometimes fail to
use No. 2 pencils or make extraneous marks on an answer sheets, introducing another potential
source of scoring error. Needless to say, when tests are hand scored, as most classroom tests are, the
likelihood of error increases greatly. In fact, because you are human, you can be sure that you will
make some scoring errors in grading the tests you give.
Sources of Error Influencing Various Reliability Coefficients
Test-Retest
 Short-interval test-retest coefficients are not likely to be affected greatly by within-student
error.
Page | 9
SHELAMIE M. SANTILLAN – EDUC 243 Student S.Y. 2nd
Sem. 2016-2017
 Any problem that do exist in the test are present both the first and second administrations,
affecting scores the same way each time the test is administered.
Alternate Forms
 Since alternate-forms reliability is determined by administering two different forms or versions
of the same test to the same group close together in time, the effects within student error are
negligible.
 Error within the test, however, has a significant effect on alternate-forms reliability.
 As with test-retest method, alternate-forms score reliability is not greatly affected by error in
administering or scoring the test, as long as similar procedures are followed.
Internal Consistency
With test-retest and alternate-forms reliability, with-in-student factors affect the method of
estimating score reliability, since changes in test performance due to such problems as fatigue,
momentary anxiety, illness or just having an “off” day can be doubled because there two separate
administrations of the test. If the test is sensitive to those problems, it will record different scores from
one administration to another, lowering the reliability (or correlation coefficient) between them.
Obviously, we would prefer that that the test not be affected by those problems. But if it is, we would
like to know about it.
Summary of the influence of the four sources of error on the test-retest, alternate forms, and
internal consistency methods of estimating test score reliability.
Page | 10
SHELAMIE M. SANTILLAN – EDUC 243 Student S.Y. 2nd
Sem. 2016-2017
With test-retest and alternate-forms reliability, with-in-student factors affect the method of
estimating score reliability, since changes in test performance due to such problems as fatigue,
momentary anxiety, illness or just having an “off” day can be doubled because there two separate
administrations of the test. If the test is sensitive to those problems, it will record different scores from
one administration to another, lowering the reliability (or correlation coefficient) between them.
Obviously, we would prefer that that the test not be affected by those problems. But if it is, we would
like to know about it.
BAND INTERPRETATION
 uses the standard error of measurement to a more realistic interpretation and report
groups of test scores.
Formula to compute the reliability of the difference score is as follows:
Step 1: List Data (let’s assume)
Page | 11
SHELAMIE M. SANTILLAN – EDUC 243 Student S.Y. 2nd
Sem. 2016-2017
List subtests and scores and the M, SD, and reliability (r) for each subject. For purpose of
illustration, let’s assume that the mean is 100, the standard deviation is 10, and the score reliability is
.91 for all the subtests.
M: 100 , SD: 10, Score reliability - .91 for all subtests.
Step 2: Determine Sm (standard error of measurement)
Since SD and r are the same for each subtest in this example, the standard error of measurement will
be the same for each student.
Since SD and r are the same for each subtest in this example, the standard error of measurement will
be the same for each student.
Step 2: Add and Subtract Sm
To identify the band or interval of scores that has 68% chance of capturing John’s true score,
add and subtract Sm to each subtest score. If the test could be given to John 100 times (without John
learning from taking the test), 68 out of 100 times John’s true score would be within the following
bands:
Step 3: Graph the Results
Shade in the bands to represent the range of scores that has 68% chance of capturing John’s true
score.
Here are the subtest scores for John:
Page | 12
SHELAMIE M. SANTILLAN – EDUC 243 Student S.Y. 2nd
Sem. 2016-2017
To identify the band or interval of scores that has 68% chance of capturing John’s true score,
add and subtract Sm to each subtest score. If the test could be given to John 100 times (without John
learning from taking the test), 68 out of 100 times John’s true score would be within the following
bands:
Step 4: Interpret the Bands
• Interpret the profile of bands by visually inspecting the bars to see which bands overlap and
which do not.
• Those that overlap probably represent differences that likely occurred by chance.
Final Word:
 Technically, there are more accurate statistical procedures for determining real differences
between an individual’s test scores than the ones we have been able to present here. These
procedures, however, are time-consuming, complex, and overly specific for the typical teacher.
 Within the classroom, band interpretation, properly used, makes for a practical alternative to
those more advanced methods and is superior to simply comparing individual scores.
Summary
Chapter 17 presented the concepts of accuracy and error, the statistic used to represent error
(the standard error of measurement), and the four sources of error in measurement. Its major points
are as follows:
1. No test is perfectly reliable, and therefore no test is perfectly accurate. All test are subject to
error.
2. Error is any factor that leads an individual to perform better or worse on a test than the
individual’s true level of performance.
3. An obtained score for a test contains two components: one reflects the individual’s true level of
performance, and the other reflects the error associated with the test that prevents the
individual form performing at exactly his or her true level of ability.
4. An individual’s true score and error score are hypothetical. We can never be sure exactly what
they are.
5. The standard error of measurement (Sm) is the standard deviation error score distribution of a
test.
Page | 13
SHELAMIE M. SANTILLAN – EDUC 243 Student S.Y. 2nd
Sem. 2016-2017
6. The standard error of measurement for a test can be determined mathematically if we know
the test’s standard deviation and reliability.
7. The error distribution is normally distributed has a mean of zero, and its standard deviation is
the standard error of measurement.
8. We can be 68% sure a true score lies in the range of scores between one Sm and one Sm
below that individual’s obtained score.
9. We can be 95% sure a true score lies in the range of scores between two Sm above and two
Sm below the obtained score.
10. We can be 99% sure a true score lies in the range of scores between three Sm above and three
Sm below the obtained score.
11. Using the Sm in interpreting scores helps us avoid overinterpreting small differences among
test scores.
12. Accuracy increases and Sm decreases as reliability increases.
13. The Sm refers to the variability of error scores; the SD refers to the variability of raw scores.
14. We can improve test accuracy by controlling the sources of error in testing.
15. Error is classified into four categories:
a. Error within the test takers.
b. Error within the test.
c. Error within the test administration.
d. Error within scoring.
16. Error within the test takers refers to changes in the student over which the test administrator
has no control.
17. Error within the test refers to technical problems in the test. The test developer has considerable
control over this source of error.
18. Error within the test administration refers to physical, verbal, and attitudinal variables that vary
from administration and affect test scores. The administrator can minimize this source of error
by standardizing administration procedures.
19. Error in scoring typically refers to clerical errors that influence test scores. The test administrator
can control this source of error by using objective items and clerical checks or by using
computer scoring.
20. Different estimates of score reliability are differentially affected by the various sources of error.
21. Test-retest score reliability is most susceptible to error within test takers.
22. Alternate-forms score reliability is most affected by error within the two forms of the test. As a
result, it usually yields lower score reliability estimates than test-retest or internal consistency
estimates.
23. Since only one administration and one test are necessary for internal consistency estimates, this
approach is least affected by error. As a result, internal consistency estimates of score reliability
typically will be higher for the same test than for test-retest or alternate-forms score reliability
estimates.
24. Band interpretation is a technique that the teacher can use to help separate real differences in
student achievement from differences due to chance. This approach helps prevent
overinterpretation of small differences among subtests in achievement test batteries.

More Related Content

What's hot

MEASURES OF CENTRAL TENDENCY by: Leizyl Lugsanay Crispo
MEASURES OF CENTRAL TENDENCY by: Leizyl Lugsanay CrispoMEASURES OF CENTRAL TENDENCY by: Leizyl Lugsanay Crispo
MEASURES OF CENTRAL TENDENCY by: Leizyl Lugsanay CrispoLeizyl Crispo
 
Topic 8a Basic Statistics
Topic 8a Basic StatisticsTopic 8a Basic Statistics
Topic 8a Basic StatisticsYee Bee Choo
 
Central Tendency - Overview
Central Tendency - Overview Central Tendency - Overview
Central Tendency - Overview Sr Edith Bogue
 
St josephs 3rd level learning intentions 2018
St josephs 3rd level learning intentions 2018St josephs 3rd level learning intentions 2018
St josephs 3rd level learning intentions 2018sjamaths
 
Measures of central tendency
Measures of central tendencyMeasures of central tendency
Measures of central tendencyJincy Raj
 
Business research measurement and scaling
Business research measurement and scalingBusiness research measurement and scaling
Business research measurement and scalingNishant Pahad
 
Measurement and scaling noncomparative scaling technique
Measurement and scaling noncomparative scaling techniqueMeasurement and scaling noncomparative scaling technique
Measurement and scaling noncomparative scaling techniqueRohit Kumar
 
St josephs 3rd level course breakdown
St josephs 3rd level course breakdownSt josephs 3rd level course breakdown
St josephs 3rd level course breakdownsjamaths
 
Measures of-central-tendency
Measures of-central-tendencyMeasures of-central-tendency
Measures of-central-tendencyJhonna Barrosa
 
200 chapter 7 measurement :scaling by uma sekaran
200 chapter 7 measurement :scaling by uma sekaran 200 chapter 7 measurement :scaling by uma sekaran
200 chapter 7 measurement :scaling by uma sekaran Irfan Sheikh
 
Difficulty Index, Discrimination Index, Reliability and Rasch Measurement Ana...
Difficulty Index, Discrimination Index, Reliability and Rasch Measurement Ana...Difficulty Index, Discrimination Index, Reliability and Rasch Measurement Ana...
Difficulty Index, Discrimination Index, Reliability and Rasch Measurement Ana...Azmi Mohd Tamil
 
Measures of central tendency by MHM
Measures of central tendency by MHMMeasures of central tendency by MHM
Measures of central tendency by MHMMd Mosharof Hosen
 
Numerical measures stat ppt @ bec doms
Numerical measures stat ppt @ bec domsNumerical measures stat ppt @ bec doms
Numerical measures stat ppt @ bec domsBabasab Patil
 
Lesson 1 05 measuring central tendency
Lesson 1 05 measuring central tendencyLesson 1 05 measuring central tendency
Lesson 1 05 measuring central tendencyPerla Pelicano Corpez
 
Lect w2 measures_of_location_and_spread
Lect w2 measures_of_location_and_spreadLect w2 measures_of_location_and_spread
Lect w2 measures_of_location_and_spreadRione Drevale
 

What's hot (20)

Chapter 4
Chapter 4Chapter 4
Chapter 4
 
MEASURES OF CENTRAL TENDENCY by: Leizyl Lugsanay Crispo
MEASURES OF CENTRAL TENDENCY by: Leizyl Lugsanay CrispoMEASURES OF CENTRAL TENDENCY by: Leizyl Lugsanay Crispo
MEASURES OF CENTRAL TENDENCY by: Leizyl Lugsanay Crispo
 
Measure of central tendency
Measure of central tendencyMeasure of central tendency
Measure of central tendency
 
Topic 8a Basic Statistics
Topic 8a Basic StatisticsTopic 8a Basic Statistics
Topic 8a Basic Statistics
 
Central Tendency - Overview
Central Tendency - Overview Central Tendency - Overview
Central Tendency - Overview
 
St josephs 3rd level learning intentions 2018
St josephs 3rd level learning intentions 2018St josephs 3rd level learning intentions 2018
St josephs 3rd level learning intentions 2018
 
Measures of central tendency
Measures of central tendencyMeasures of central tendency
Measures of central tendency
 
Business research measurement and scaling
Business research measurement and scalingBusiness research measurement and scaling
Business research measurement and scaling
 
Measurement and scaling noncomparative scaling technique
Measurement and scaling noncomparative scaling techniqueMeasurement and scaling noncomparative scaling technique
Measurement and scaling noncomparative scaling technique
 
St josephs 3rd level course breakdown
St josephs 3rd level course breakdownSt josephs 3rd level course breakdown
St josephs 3rd level course breakdown
 
Measures of-central-tendency
Measures of-central-tendencyMeasures of-central-tendency
Measures of-central-tendency
 
200 chapter 7 measurement :scaling by uma sekaran
200 chapter 7 measurement :scaling by uma sekaran 200 chapter 7 measurement :scaling by uma sekaran
200 chapter 7 measurement :scaling by uma sekaran
 
Difficulty Index, Discrimination Index, Reliability and Rasch Measurement Ana...
Difficulty Index, Discrimination Index, Reliability and Rasch Measurement Ana...Difficulty Index, Discrimination Index, Reliability and Rasch Measurement Ana...
Difficulty Index, Discrimination Index, Reliability and Rasch Measurement Ana...
 
Data meeting
Data meeting Data meeting
Data meeting
 
Seawell_Exam
Seawell_ExamSeawell_Exam
Seawell_Exam
 
Measures of central tendency by MHM
Measures of central tendency by MHMMeasures of central tendency by MHM
Measures of central tendency by MHM
 
Numerical measures stat ppt @ bec doms
Numerical measures stat ppt @ bec domsNumerical measures stat ppt @ bec doms
Numerical measures stat ppt @ bec doms
 
Lesson 1 05 measuring central tendency
Lesson 1 05 measuring central tendencyLesson 1 05 measuring central tendency
Lesson 1 05 measuring central tendency
 
Knutson Ibis math_talks
Knutson Ibis math_talksKnutson Ibis math_talks
Knutson Ibis math_talks
 
Lect w2 measures_of_location_and_spread
Lect w2 measures_of_location_and_spreadLect w2 measures_of_location_and_spread
Lect w2 measures_of_location_and_spread
 

Similar to Chapter 17 error and accuracy

Validity in Psychological Testing
Validity in Psychological TestingValidity in Psychological Testing
Validity in Psychological TestingSandra Arenillo
 
objective test scoring and essay scoring
objective test scoring and essay scoringobjective test scoring and essay scoring
objective test scoring and essay scoringSANCHAYEETA2
 
STATISTICS | Measures Of Central Tendency
STATISTICS | Measures Of Central TendencySTATISTICS | Measures Of Central Tendency
STATISTICS | Measures Of Central TendencyMaulen Bale
 
Measures Of Central Tendency
Measures Of Central TendencyMeasures Of Central Tendency
Measures Of Central TendencyMaulen Bale
 
Research methods 2 operationalization & measurement
Research methods 2   operationalization & measurementResearch methods 2   operationalization & measurement
Research methods 2 operationalization & measurementattique1960
 
Math 009 Final Examination Spring, 2015 1 Answer Sheet M.docx
Math 009 Final Examination Spring, 2015 1 Answer Sheet M.docxMath 009 Final Examination Spring, 2015 1 Answer Sheet M.docx
Math 009 Final Examination Spring, 2015 1 Answer Sheet M.docxandreecapon
 
Statistical ProcessesCan descriptive statistical processes b.docx
Statistical ProcessesCan descriptive statistical processes b.docxStatistical ProcessesCan descriptive statistical processes b.docx
Statistical ProcessesCan descriptive statistical processes b.docxdarwinming1
 
Module 3 statistics
Module 3   statisticsModule 3   statistics
Module 3 statisticsdionesioable
 
CEFS 521 Quiz -3, Liberty University_3 Version answer, secure HIGHSCORE
CEFS 521 Quiz -3, Liberty University_3 Version answer, secure HIGHSCORECEFS 521 Quiz -3, Liberty University_3 Version answer, secure HIGHSCORE
CEFS 521 Quiz -3, Liberty University_3 Version answer, secure HIGHSCORENiniProton
 
As mentioned earlier, the mid-term will have conceptual and quanti.docx
As mentioned earlier, the mid-term will have conceptual and quanti.docxAs mentioned earlier, the mid-term will have conceptual and quanti.docx
As mentioned earlier, the mid-term will have conceptual and quanti.docxfredharris32
 
Ms 95 - research methodology for management decisions
Ms 95 - research methodology for management decisionsMs 95 - research methodology for management decisions
Ms 95 - research methodology for management decisionssmumbahelp
 
Central tendency spread
Central tendency spreadCentral tendency spread
Central tendency spreadCTLTLA
 
Topic 10 DATA ANALYSIS TECHNIQUES.pptx
Topic 10 DATA ANALYSIS TECHNIQUES.pptxTopic 10 DATA ANALYSIS TECHNIQUES.pptx
Topic 10 DATA ANALYSIS TECHNIQUES.pptxEdwinDagunot4
 
How to make tests more reliable
How to make tests more reliableHow to make tests more reliable
How to make tests more reliableNawaphat Deelert
 
Answer the questions in one paragraph 4-5 sentences. · Why did t.docx
Answer the questions in one paragraph 4-5 sentences. · Why did t.docxAnswer the questions in one paragraph 4-5 sentences. · Why did t.docx
Answer the questions in one paragraph 4-5 sentences. · Why did t.docxboyfieldhouse
 
Ashford 2 - Week 1 - Instructor GuidanceWeek OverviewThe f.docx
Ashford 2 - Week 1 - Instructor GuidanceWeek OverviewThe f.docxAshford 2 - Week 1 - Instructor GuidanceWeek OverviewThe f.docx
Ashford 2 - Week 1 - Instructor GuidanceWeek OverviewThe f.docxfredharris32
 
Data Analysis for Graduate Studies Summary
Data Analysis for Graduate Studies SummaryData Analysis for Graduate Studies Summary
Data Analysis for Graduate Studies SummaryKelvinNMhina
 
Descriptive Statistics
Descriptive StatisticsDescriptive Statistics
Descriptive StatisticsKate Organ
 

Similar to Chapter 17 error and accuracy (20)

Validity in Psychological Testing
Validity in Psychological TestingValidity in Psychological Testing
Validity in Psychological Testing
 
objective test scoring and essay scoring
objective test scoring and essay scoringobjective test scoring and essay scoring
objective test scoring and essay scoring
 
STATISTICS | Measures Of Central Tendency
STATISTICS | Measures Of Central TendencySTATISTICS | Measures Of Central Tendency
STATISTICS | Measures Of Central Tendency
 
Measures Of Central Tendency
Measures Of Central TendencyMeasures Of Central Tendency
Measures Of Central Tendency
 
Research methods 2 operationalization & measurement
Research methods 2   operationalization & measurementResearch methods 2   operationalization & measurement
Research methods 2 operationalization & measurement
 
Math 009 Final Examination Spring, 2015 1 Answer Sheet M.docx
Math 009 Final Examination Spring, 2015 1 Answer Sheet M.docxMath 009 Final Examination Spring, 2015 1 Answer Sheet M.docx
Math 009 Final Examination Spring, 2015 1 Answer Sheet M.docx
 
Statistical ProcessesCan descriptive statistical processes b.docx
Statistical ProcessesCan descriptive statistical processes b.docxStatistical ProcessesCan descriptive statistical processes b.docx
Statistical ProcessesCan descriptive statistical processes b.docx
 
Reliability and validity
Reliability and validityReliability and validity
Reliability and validity
 
Module 3 statistics
Module 3   statisticsModule 3   statistics
Module 3 statistics
 
CEFS 521 Quiz -3, Liberty University_3 Version answer, secure HIGHSCORE
CEFS 521 Quiz -3, Liberty University_3 Version answer, secure HIGHSCORECEFS 521 Quiz -3, Liberty University_3 Version answer, secure HIGHSCORE
CEFS 521 Quiz -3, Liberty University_3 Version answer, secure HIGHSCORE
 
As mentioned earlier, the mid-term will have conceptual and quanti.docx
As mentioned earlier, the mid-term will have conceptual and quanti.docxAs mentioned earlier, the mid-term will have conceptual and quanti.docx
As mentioned earlier, the mid-term will have conceptual and quanti.docx
 
Ms 95 - research methodology for management decisions
Ms 95 - research methodology for management decisionsMs 95 - research methodology for management decisions
Ms 95 - research methodology for management decisions
 
Central tendency spread
Central tendency spreadCentral tendency spread
Central tendency spread
 
Topic 10 DATA ANALYSIS TECHNIQUES.pptx
Topic 10 DATA ANALYSIS TECHNIQUES.pptxTopic 10 DATA ANALYSIS TECHNIQUES.pptx
Topic 10 DATA ANALYSIS TECHNIQUES.pptx
 
Accuracy & Error
Accuracy & ErrorAccuracy & Error
Accuracy & Error
 
How to make tests more reliable
How to make tests more reliableHow to make tests more reliable
How to make tests more reliable
 
Answer the questions in one paragraph 4-5 sentences. · Why did t.docx
Answer the questions in one paragraph 4-5 sentences. · Why did t.docxAnswer the questions in one paragraph 4-5 sentences. · Why did t.docx
Answer the questions in one paragraph 4-5 sentences. · Why did t.docx
 
Ashford 2 - Week 1 - Instructor GuidanceWeek OverviewThe f.docx
Ashford 2 - Week 1 - Instructor GuidanceWeek OverviewThe f.docxAshford 2 - Week 1 - Instructor GuidanceWeek OverviewThe f.docx
Ashford 2 - Week 1 - Instructor GuidanceWeek OverviewThe f.docx
 
Data Analysis for Graduate Studies Summary
Data Analysis for Graduate Studies SummaryData Analysis for Graduate Studies Summary
Data Analysis for Graduate Studies Summary
 
Descriptive Statistics
Descriptive StatisticsDescriptive Statistics
Descriptive Statistics
 

Recently uploaded

Mastering the Unannounced Regulatory Inspection
Mastering the Unannounced Regulatory InspectionMastering the Unannounced Regulatory Inspection
Mastering the Unannounced Regulatory InspectionSafetyChain Software
 
Arihant handbook biology for class 11 .pdf
Arihant handbook biology for class 11 .pdfArihant handbook biology for class 11 .pdf
Arihant handbook biology for class 11 .pdfchloefrazer622
 
Grant Readiness 101 TechSoup and Remy Consulting
Grant Readiness 101 TechSoup and Remy ConsultingGrant Readiness 101 TechSoup and Remy Consulting
Grant Readiness 101 TechSoup and Remy ConsultingTechSoup
 
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptxSOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptxiammrhaywood
 
Employee wellbeing at the workplace.pptx
Employee wellbeing at the workplace.pptxEmployee wellbeing at the workplace.pptx
Employee wellbeing at the workplace.pptxNirmalaLoungPoorunde1
 
Q4-W6-Restating Informational Text Grade 3
Q4-W6-Restating Informational Text Grade 3Q4-W6-Restating Informational Text Grade 3
Q4-W6-Restating Informational Text Grade 3JemimahLaneBuaron
 
Industrial Policy - 1948, 1956, 1973, 1977, 1980, 1991
Industrial Policy - 1948, 1956, 1973, 1977, 1980, 1991Industrial Policy - 1948, 1956, 1973, 1977, 1980, 1991
Industrial Policy - 1948, 1956, 1973, 1977, 1980, 1991RKavithamani
 
mini mental status format.docx
mini    mental       status     format.docxmini    mental       status     format.docx
mini mental status format.docxPoojaSen20
 
APM Welcome, APM North West Network Conference, Synergies Across Sectors
APM Welcome, APM North West Network Conference, Synergies Across SectorsAPM Welcome, APM North West Network Conference, Synergies Across Sectors
APM Welcome, APM North West Network Conference, Synergies Across SectorsAssociation for Project Management
 
Incoming and Outgoing Shipments in 1 STEP Using Odoo 17
Incoming and Outgoing Shipments in 1 STEP Using Odoo 17Incoming and Outgoing Shipments in 1 STEP Using Odoo 17
Incoming and Outgoing Shipments in 1 STEP Using Odoo 17Celine George
 
18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdf
18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdf18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdf
18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdfssuser54595a
 
Micromeritics - Fundamental and Derived Properties of Powders
Micromeritics - Fundamental and Derived Properties of PowdersMicromeritics - Fundamental and Derived Properties of Powders
Micromeritics - Fundamental and Derived Properties of PowdersChitralekhaTherkar
 
“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...
“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...
“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...Marc Dusseiller Dusjagr
 
How to Make a Pirate ship Primary Education.pptx
How to Make a Pirate ship Primary Education.pptxHow to Make a Pirate ship Primary Education.pptx
How to Make a Pirate ship Primary Education.pptxmanuelaromero2013
 
PSYCHIATRIC History collection FORMAT.pptx
PSYCHIATRIC   History collection FORMAT.pptxPSYCHIATRIC   History collection FORMAT.pptx
PSYCHIATRIC History collection FORMAT.pptxPoojaSen20
 
Presiding Officer Training module 2024 lok sabha elections
Presiding Officer Training module 2024 lok sabha electionsPresiding Officer Training module 2024 lok sabha elections
Presiding Officer Training module 2024 lok sabha electionsanshu789521
 
Separation of Lanthanides/ Lanthanides and Actinides
Separation of Lanthanides/ Lanthanides and ActinidesSeparation of Lanthanides/ Lanthanides and Actinides
Separation of Lanthanides/ Lanthanides and ActinidesFatimaKhan178732
 
POINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptx
POINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptxPOINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptx
POINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptxSayali Powar
 

Recently uploaded (20)

Mastering the Unannounced Regulatory Inspection
Mastering the Unannounced Regulatory InspectionMastering the Unannounced Regulatory Inspection
Mastering the Unannounced Regulatory Inspection
 
Arihant handbook biology for class 11 .pdf
Arihant handbook biology for class 11 .pdfArihant handbook biology for class 11 .pdf
Arihant handbook biology for class 11 .pdf
 
Grant Readiness 101 TechSoup and Remy Consulting
Grant Readiness 101 TechSoup and Remy ConsultingGrant Readiness 101 TechSoup and Remy Consulting
Grant Readiness 101 TechSoup and Remy Consulting
 
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptxSOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
 
Employee wellbeing at the workplace.pptx
Employee wellbeing at the workplace.pptxEmployee wellbeing at the workplace.pptx
Employee wellbeing at the workplace.pptx
 
Q4-W6-Restating Informational Text Grade 3
Q4-W6-Restating Informational Text Grade 3Q4-W6-Restating Informational Text Grade 3
Q4-W6-Restating Informational Text Grade 3
 
Industrial Policy - 1948, 1956, 1973, 1977, 1980, 1991
Industrial Policy - 1948, 1956, 1973, 1977, 1980, 1991Industrial Policy - 1948, 1956, 1973, 1977, 1980, 1991
Industrial Policy - 1948, 1956, 1973, 1977, 1980, 1991
 
mini mental status format.docx
mini    mental       status     format.docxmini    mental       status     format.docx
mini mental status format.docx
 
Model Call Girl in Bikash Puri Delhi reach out to us at 🔝9953056974🔝
Model Call Girl in Bikash Puri  Delhi reach out to us at 🔝9953056974🔝Model Call Girl in Bikash Puri  Delhi reach out to us at 🔝9953056974🔝
Model Call Girl in Bikash Puri Delhi reach out to us at 🔝9953056974🔝
 
APM Welcome, APM North West Network Conference, Synergies Across Sectors
APM Welcome, APM North West Network Conference, Synergies Across SectorsAPM Welcome, APM North West Network Conference, Synergies Across Sectors
APM Welcome, APM North West Network Conference, Synergies Across Sectors
 
Incoming and Outgoing Shipments in 1 STEP Using Odoo 17
Incoming and Outgoing Shipments in 1 STEP Using Odoo 17Incoming and Outgoing Shipments in 1 STEP Using Odoo 17
Incoming and Outgoing Shipments in 1 STEP Using Odoo 17
 
18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdf
18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdf18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdf
18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdf
 
Micromeritics - Fundamental and Derived Properties of Powders
Micromeritics - Fundamental and Derived Properties of PowdersMicromeritics - Fundamental and Derived Properties of Powders
Micromeritics - Fundamental and Derived Properties of Powders
 
“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...
“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...
“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...
 
Código Creativo y Arte de Software | Unidad 1
Código Creativo y Arte de Software | Unidad 1Código Creativo y Arte de Software | Unidad 1
Código Creativo y Arte de Software | Unidad 1
 
How to Make a Pirate ship Primary Education.pptx
How to Make a Pirate ship Primary Education.pptxHow to Make a Pirate ship Primary Education.pptx
How to Make a Pirate ship Primary Education.pptx
 
PSYCHIATRIC History collection FORMAT.pptx
PSYCHIATRIC   History collection FORMAT.pptxPSYCHIATRIC   History collection FORMAT.pptx
PSYCHIATRIC History collection FORMAT.pptx
 
Presiding Officer Training module 2024 lok sabha elections
Presiding Officer Training module 2024 lok sabha electionsPresiding Officer Training module 2024 lok sabha elections
Presiding Officer Training module 2024 lok sabha elections
 
Separation of Lanthanides/ Lanthanides and Actinides
Separation of Lanthanides/ Lanthanides and ActinidesSeparation of Lanthanides/ Lanthanides and Actinides
Separation of Lanthanides/ Lanthanides and Actinides
 
POINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptx
POINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptxPOINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptx
POINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptx
 

Chapter 17 error and accuracy

  • 1. Page | 1 SHELAMIE M. SANTILLAN – EDUC 243 Student S.Y. 2nd Sem. 2016-2017 Graduate School EDUC 243 – EDUCATIONAL EVALUATION 2nd Semester School Year 2016-2017 Narrative Report Chapter 17: Accuracy and Error 1. Error – What is it? 2. The Standard Error of Measurement 3. Standard Deviation or Standard Error of Measurement? 4. Why all the fuss about error? Sources of error: 1. Test Takers 2. The Test Itself 3. Test Administration 4. Test Scoring 5. Sources of Error Influencing various Reliability Coefficients 1. Test-Retest 2. Alternate Forms 3. Internal Consistency 6. Band Interpretation Submitted to: DR. JAMES L. PAGLINAWAN Submitted by: SHELAMIE M. SANTILLAN EDUC 243 Student
  • 2. Page | 2 SHELAMIE M. SANTILLAN – EDUC 243 Student S.Y. 2nd Sem. 2016-2017 When is a test score inaccurate? Almost always. All tests and scores are imperfect and are subject to error Error – What is it?  No test measures perfectly, and many tests fail to measure as well as we would like them to.  Tests make “mistakes”. They are always associated with some degree of error. Think about the last test you took. Did you obtain exactly the score you thought or knew you deserved? Was your score higher than you expected? Was it lower than you expected? What about your obtained scores on all the other tests you have taken? Did they truly reflect your skill, knowledge, or ability, or did they sometimes underestimate your knowledge, ability, or skill? Or did they overestimate? If your obtained test scores did not always reflect your true ability, they were associated with some error. Your obtained scores may have been lower or higher than they should have been. In short, an obtained score has a true component (actual level of ability, knowledge) and an error component (which may act to lower or raise the obtained score). Example of a type of error that lower your obtained score?  When you couldn’t sleep the night before the test  When you are sick but took the test anyway  When the essay test you were taking was so poorly constructed it was hard to tell what was being tested.  When the test had a 45-minute time limit but you were allowed only 38 minutes,  When you took a test that had multiple defensible answers Example of a type error (of situation) that raised your obtained score?  The time you just happened to see the answers on your neighbor’s paper,  The time you got lucky guessing,  The time you had 52 minutes for a 45-minute test  The time the test was so full of unintentional clues that you were able to answer several questions based on the information given in other question. Then how does one go about discovering one’s true score? Unfortunately, we don’t have an answer. The true score and the error score are both theoretical or hypothetical values. We never actually know an individual’s true score or error score. Why bother with the true score or error score? Because they allow us to illustrate some important points about test score reliability and test score accuracy.
  • 3. Page | 3 SHELAMIE M. SANTILLAN – EDUC 243 Student S.Y. 2nd Sem. 2016-2017 Simply keep the mind! Remember: Obtained Score = true score+ error score The Standard Error of Measurement (abbreviated Sm )  Is the standard deviation of error scores of a test.  We will use the error scores from table 17.1 (3, -7, -2, 5,4, -3) Step 1: Determine the mean. Step 2: Subtract the mean from each error score to arrive at the deviation scores. Square each deviation score and sum the squared deviations.
  • 4. Page | 4 SHELAMIE M. SANTILLAN – EDUC 243 Student S.Y. 2nd Sem. 2016-2017 Step 3: Plug the x sum into the formula and solve for the standard deviation. The standard deviation of the error score distribution, also known as the standard error of measurement, is 4. 43. If we could know what the error scores are for each test we administer, we could compute Sm in this manner. But, of course, we never know these error scores. If you are following so far, your neat question should be, “But how in the world do you determine the standard deviation of the error scores if you never know the error scores?” Fortunately, a rather simple statistical formula can be used to estimate this standard deviation (Sm) without actually knowing the error scores: Where r is the reliability of the test and SD is the test’s standard deviation. USING THE STANDARD ERROR OF MEASUREMENT Error scores are assumed to be random. As such, they cancel each other out. That is obtained scores are inflated by random error to the same extent as they are deflated by error. Another way of saying this is that the mean of the error scores for a test is zero. The distribution of the error scores is also important, since it approximates a normal distribution closely enough for us to use the normal distribution to represent it. In summary, then, we know that error scores: 1. are normally distributed 2. have a mean of zero 3. have a standard deviation called the standard error of measurement (Sm). These characteristics are illustrated in Fig. 17.1 Returning to our example from the ninth-grade math test in Table 17.1, we recall that we obtained an Sm of 4.32 for the data provided.
  • 5. Page | 5 SHELAMIE M. SANTILLAN – EDUC 243 Student S.Y. 2nd Sem. 2016-2017 Figure 17.2 illustrates the distribution of error scores for these data. What does the distribution in Fig. 17.2 tell us? Before you answer, consider this: The distribution error of scores is a normal distribution. This is important since, as you learned in Chapter 13, the normal distribution has characteristics that enable us to make decisions about scores that fall between, above, or below different points in the distribution. We are able to do so because fixed percentages of scores fall between various score values in a normal distribution. This figure tells us that the distribution of error scores is a normal distribution Fig. 17.3 The error score distribution for the test depicted in Table 17.1
  • 6. Page | 6 SHELAMIE M. SANTILLAN – EDUC 243 Student S.Y. 2nd Sem. 2016-2017 (Fig. 17.3 Should refresh your memory) We listed along the baseline the standard deviation of the error score distribution. This is more commonly called the standard error of measurement (Sm) of the test. Thus we can see that 68% of the error scores for the test will be no more than 4.32 points higher or 4.32 points lower than the true scores. That is, if there were 100 obtained scores on this test, 68 of these scores would not be “off” their true scores by more than 4.32 points. The Sm then, tells us about the distribution of obtained score around true scores. By knowing an individual’s true score we can predict what his or her obtained score is likely to be. The careful reader may be thinking, “That’s not very useful information. We can never know what a person’s true score is, only their obtained score.” This is correct. As a test users, we work only with obtained scores. However, we can follow our logic in reverse. If 68% of obtained scores fall within 1 Sm of their true scores, then 68% of true scores must fall within 1Sm of their obtained scores. Strictly speaking, this reverse logic is somewhat inaccurate, it would be true 99% of the item (Gullikson, 1987). Therefore the Sm is often used to determine how test error is likely to have affected individual obtained scores. That is, X plus or minus 4.32 (+4.32) defines the range or band. Let’s use the following number line to represent an individual’s obtained score, which we will simply call the X: Fig. 17.4 The error distribution around an obtained score of 90 for a test with Sm= 4.32 Fig. 17.5 The error distribution around an obtained score of 75 for a test with Sm = 4.32
  • 7. Page | 7 SHELAMIE M. SANTILLAN – EDUC 243 Student S.Y. 2nd Sem. 2016-2017 Why all fuss? Remember our original point. All test scores are fallible (tending to err); they contain a margin of error. The Sm is a statistic that estimates margin for us. We are accustomed to reporting a single test score. In education, we have long had a tendency to overinterpret small differences in test scores since we too often consider obtained scores to be completely accurate. Incorporating the Sm in reporting test scores greatly minimizes the likelihood of overinterpretation and forces us to consider how fallible our test scores are. After considering the Sm from a slightly different angle, we will show how to incorporate it to make comparisons among test scores. This procedure is called band interpretation. Standard Deviation or Standard error of measurement? You learned to compute and interpret SD in chapter 13. Why all the fuss about error?s For two reasons: 1. We want to make you aware of the fallibility of test scores. 2. We want to sensitize you the factors that can affect scores In reality, an individual’s obtained score is the best estimate of an individual’s true score. That is, inspite of the foregoing discussion, we usually use the obtained score as our best guest of a student’s true level of ability. Well, why all the fuss about error then?
  • 8. Page | 8 SHELAMIE M. SANTILLAN – EDUC 243 Student S.Y. 2nd Sem. 2016-2017 Classification of sources of error 1. Test Takers. 2. The test itself. 3. Test administration. 4. Test scoring. Test Takers: Factors that would likely result in an obtained score lower than a student’s true score: • fatigue and illness • Accidentally seeing another student’s answer Generally, error due to within-student factors is beyond our control. The test itself:  Trick questions  Reading level that is too high.  Ambiguous questions.  Items that are too difficult.  Poorly written items. Test Administration:  Physical Comfort (room temperature, humidity, lighting, noise, and seating arrangement are all potential sources of error for the test taker.)  Instructions & Explanations (Different test administrators provide differing amounts of information to test takers. Some spell words, provide hints, or tell whether it’s better to guess or leave blanks, while others remain fairly distant. Naturally, your score may vary depending to the amount of information you are provided.)  Test administrator Attitudes (Administrators differ in the notions they convey about the importance of the test, the extent to which they are emotionally supportive of students, and the way in which they monitor the test. To the extent that these variables affect students differently, test score reliability and accuracy will be impaired.) Error in Scoring:  When computer scoring is used, error can occur.  When test are hand scored, the likelihood of error increases greatly. The computer a highly reliable machine, is seldom the cause of such errors. But teachers and other test administrators prepare the scoring keys, introducing possibilities for error. And sometimes fail to use No. 2 pencils or make extraneous marks on an answer sheets, introducing another potential source of scoring error. Needless to say, when tests are hand scored, as most classroom tests are, the likelihood of error increases greatly. In fact, because you are human, you can be sure that you will make some scoring errors in grading the tests you give. Sources of Error Influencing Various Reliability Coefficients Test-Retest  Short-interval test-retest coefficients are not likely to be affected greatly by within-student error.
  • 9. Page | 9 SHELAMIE M. SANTILLAN – EDUC 243 Student S.Y. 2nd Sem. 2016-2017  Any problem that do exist in the test are present both the first and second administrations, affecting scores the same way each time the test is administered. Alternate Forms  Since alternate-forms reliability is determined by administering two different forms or versions of the same test to the same group close together in time, the effects within student error are negligible.  Error within the test, however, has a significant effect on alternate-forms reliability.  As with test-retest method, alternate-forms score reliability is not greatly affected by error in administering or scoring the test, as long as similar procedures are followed. Internal Consistency With test-retest and alternate-forms reliability, with-in-student factors affect the method of estimating score reliability, since changes in test performance due to such problems as fatigue, momentary anxiety, illness or just having an “off” day can be doubled because there two separate administrations of the test. If the test is sensitive to those problems, it will record different scores from one administration to another, lowering the reliability (or correlation coefficient) between them. Obviously, we would prefer that that the test not be affected by those problems. But if it is, we would like to know about it. Summary of the influence of the four sources of error on the test-retest, alternate forms, and internal consistency methods of estimating test score reliability.
  • 10. Page | 10 SHELAMIE M. SANTILLAN – EDUC 243 Student S.Y. 2nd Sem. 2016-2017 With test-retest and alternate-forms reliability, with-in-student factors affect the method of estimating score reliability, since changes in test performance due to such problems as fatigue, momentary anxiety, illness or just having an “off” day can be doubled because there two separate administrations of the test. If the test is sensitive to those problems, it will record different scores from one administration to another, lowering the reliability (or correlation coefficient) between them. Obviously, we would prefer that that the test not be affected by those problems. But if it is, we would like to know about it. BAND INTERPRETATION  uses the standard error of measurement to a more realistic interpretation and report groups of test scores. Formula to compute the reliability of the difference score is as follows: Step 1: List Data (let’s assume)
  • 11. Page | 11 SHELAMIE M. SANTILLAN – EDUC 243 Student S.Y. 2nd Sem. 2016-2017 List subtests and scores and the M, SD, and reliability (r) for each subject. For purpose of illustration, let’s assume that the mean is 100, the standard deviation is 10, and the score reliability is .91 for all the subtests. M: 100 , SD: 10, Score reliability - .91 for all subtests. Step 2: Determine Sm (standard error of measurement) Since SD and r are the same for each subtest in this example, the standard error of measurement will be the same for each student. Since SD and r are the same for each subtest in this example, the standard error of measurement will be the same for each student. Step 2: Add and Subtract Sm To identify the band or interval of scores that has 68% chance of capturing John’s true score, add and subtract Sm to each subtest score. If the test could be given to John 100 times (without John learning from taking the test), 68 out of 100 times John’s true score would be within the following bands: Step 3: Graph the Results Shade in the bands to represent the range of scores that has 68% chance of capturing John’s true score. Here are the subtest scores for John:
  • 12. Page | 12 SHELAMIE M. SANTILLAN – EDUC 243 Student S.Y. 2nd Sem. 2016-2017 To identify the band or interval of scores that has 68% chance of capturing John’s true score, add and subtract Sm to each subtest score. If the test could be given to John 100 times (without John learning from taking the test), 68 out of 100 times John’s true score would be within the following bands: Step 4: Interpret the Bands • Interpret the profile of bands by visually inspecting the bars to see which bands overlap and which do not. • Those that overlap probably represent differences that likely occurred by chance. Final Word:  Technically, there are more accurate statistical procedures for determining real differences between an individual’s test scores than the ones we have been able to present here. These procedures, however, are time-consuming, complex, and overly specific for the typical teacher.  Within the classroom, band interpretation, properly used, makes for a practical alternative to those more advanced methods and is superior to simply comparing individual scores. Summary Chapter 17 presented the concepts of accuracy and error, the statistic used to represent error (the standard error of measurement), and the four sources of error in measurement. Its major points are as follows: 1. No test is perfectly reliable, and therefore no test is perfectly accurate. All test are subject to error. 2. Error is any factor that leads an individual to perform better or worse on a test than the individual’s true level of performance. 3. An obtained score for a test contains two components: one reflects the individual’s true level of performance, and the other reflects the error associated with the test that prevents the individual form performing at exactly his or her true level of ability. 4. An individual’s true score and error score are hypothetical. We can never be sure exactly what they are. 5. The standard error of measurement (Sm) is the standard deviation error score distribution of a test.
  • 13. Page | 13 SHELAMIE M. SANTILLAN – EDUC 243 Student S.Y. 2nd Sem. 2016-2017 6. The standard error of measurement for a test can be determined mathematically if we know the test’s standard deviation and reliability. 7. The error distribution is normally distributed has a mean of zero, and its standard deviation is the standard error of measurement. 8. We can be 68% sure a true score lies in the range of scores between one Sm and one Sm below that individual’s obtained score. 9. We can be 95% sure a true score lies in the range of scores between two Sm above and two Sm below the obtained score. 10. We can be 99% sure a true score lies in the range of scores between three Sm above and three Sm below the obtained score. 11. Using the Sm in interpreting scores helps us avoid overinterpreting small differences among test scores. 12. Accuracy increases and Sm decreases as reliability increases. 13. The Sm refers to the variability of error scores; the SD refers to the variability of raw scores. 14. We can improve test accuracy by controlling the sources of error in testing. 15. Error is classified into four categories: a. Error within the test takers. b. Error within the test. c. Error within the test administration. d. Error within scoring. 16. Error within the test takers refers to changes in the student over which the test administrator has no control. 17. Error within the test refers to technical problems in the test. The test developer has considerable control over this source of error. 18. Error within the test administration refers to physical, verbal, and attitudinal variables that vary from administration and affect test scores. The administrator can minimize this source of error by standardizing administration procedures. 19. Error in scoring typically refers to clerical errors that influence test scores. The test administrator can control this source of error by using objective items and clerical checks or by using computer scoring. 20. Different estimates of score reliability are differentially affected by the various sources of error. 21. Test-retest score reliability is most susceptible to error within test takers. 22. Alternate-forms score reliability is most affected by error within the two forms of the test. As a result, it usually yields lower score reliability estimates than test-retest or internal consistency estimates. 23. Since only one administration and one test are necessary for internal consistency estimates, this approach is least affected by error. As a result, internal consistency estimates of score reliability typically will be higher for the same test than for test-retest or alternate-forms score reliability estimates. 24. Band interpretation is a technique that the teacher can use to help separate real differences in student achievement from differences due to chance. This approach helps prevent overinterpretation of small differences among subtests in achievement test batteries.