• Reliability is a concept referring to the consistency of the results
of the test. Thus, if the same test is administered to the same test-
takers in different time, the results will remain the same as the
previous one.
• The issue of reliability of a test may best be addressed by
considering a number of factors that may contribute to the
unreliability of a test, such as fluctuations:
(1) in the student (student-related reliability)
(2) in scoring (rater reliability)
(3) in test administration (test administration
reliability), and
(4) in the test (test reliability)
Reliability
(1) Student-related reliability:
temporary illness, fatigue, a bad day, anxiety and other physical or psychological
factors may make an observed score deviate from one’s “true” score. Also a test-
taker’s “test-wiseness” or strategies for efficient test taking can also be included
in this category.
(2) Rater reliability:
Human error, subjectivity, lack of attention to scoring criteria, inexperience,
inattention or even preconceived biases may enter into scoring process.
Inter-rater unreliability occurs when two or more scores yield inconsistent
scores of the same test.
Intra-rater unreliability is a common occurence for classroom teachers because
of unclear scoring criteria, fatigue, bias toward particular “good” and “bad”
students, or simple carelessness. One solution to such intra-rater unreliability is
to read through about half of the tests before rendering any final scores or
grades, then to recycle back through the whole set of tests to ensure an even-
handed judgement.
The careful specification of an analytical scoring instrument can increase rater-
reliability.
(3) Test Administration Reliability:
Unreliability may also result from the conditions in
which the test is administered. Samples: street noise,
photocopying variations, poor light, variations in
temperature, conditions of desk and chairs.
(4) Test unreliability:
sometimes the nature of the test itself can cause
measurement errors.
- timed test may discriminate against students who do
not perform well on a test with a time limit.
- poorly written test items (that are ambiguous or that
have more than one correct answer) may be a further
source of test unreliability.
• An observed score on any test was a composite of two
components: a true and an error component: X=T+E.
If we asked an individual to take a test many times, the
observed score would be slightly different each time.
Spearman assumed that all error was random. That is, the
observed score was only an estimate of the true score
because random influence changed the observed score
(fluctuations in environment (changes in test
administration), psychological changes (tiredness) ---
(Fulcher & Davidson, 2003)
• Methods of computing reliability therefore try to estimate the
extent to which an observed score in near to a true score, for it is
only if the observed score is a good estimate of a true
score that we can draw sound inferences from the score to
construct.
METHODS OF TEST RELIABILITY
• Reliability can be established
internally or externally
indicated by statistical figure
known as correlation
coefficient index, ranging from
0.00 to 1.00 (Weir, 1990).
ESTABLISHING RELIABILITY
• In practice, most tests report measures of internal consistency
as reliability coefficients.
These measures are simply a measure of mean inter-item correlation,
or how well items correlate with each other.
• However, internal reliability coefficients are also affected by other
factors:
■ Number of items on the test: increasing the number of
items will increase reliability.
■ Variation in item difficulty: items should be of equal
difficulty to increase reliability, while if items have a
range of facility values, reliability will decrease.
■ Dispersion of scores: If the sample upon which the test is
field-tested is homogeneous (there is not a spread of
scores), reliability will decrease.
■ Level of item difficulty: items with facility values of 0.5
maximize item variance, and so increase test reliability.
INTERNAL RELIABILITY
• Internal reliability is measuring the internal
consistency of the test by way of giving test
once.
• The internal reliability can be obtained either
through split-half method (by applying
Spearman-Brown formula) or Kuder-
Richardson method (by applying Kuder-
Richardson formula K-R21 or K-R20).
Split halves
• From the administration of a single
test, half of the items are taken to
represent one form of the test and
correlated with the items in the
other half of the test. The correlation
coefficient is taken as a measure of
reliability.
Split-halves
• A single test was split into two reasonable equivalent halves. One
common method of splitting a test has been to score the odd-
numbered items and the even-numbered items separately. Then,
the correlation between scores on the odd- and even-numbered
items is calculated. To obtain an estimate of the reliability of the
total test, use Spearman-Brown formula.
rxx’ = n x r
(n-1)r + 1
Where: rxx’ = full-test reliability
r = correlation between two test parts
n = number of times the test length is to be increased
• 2 x .60 = 1.20 = .75
(2-1) .60 + 1 1.60
• Kuder-Richardson Formula 21: Perhaps the
easiest reliability formula to calculate is KR-21
• Kuder-Richardson Formula 20 : KR-21 is a
quick estimate of KR-20, and usually produces
a value somewhat lower than KR-20. It can be
significantly lower if small samples or small
numbers of items are involved.
USE OF K-R 20 OR 21 (test once)
METODE KOEFISIEN ALFA
• DIGUNAKAN UNTUK PENILAIAN BENTUK ESAI,
TERMASUK TES MENGARANG, ATAU PENILAIAN
DENGAN URUTAN SKALA.
• MENGGUAKAN METODE KOEFISIEN ALPHA DARI
CRONBACH, DENGAN MENGHITUNG VARIAN
MASING MASING BUTIR SOAL ATAU (SD2) Dan
VARIAN SELURUH SKOR (SDx2)
• MISAL UTK MENULIS: ISI, ORGANISASI, BAHASA,
KOSAKATA, EJAAN
EXTERNAL RELIABILITY
• External reliability is measuring the external
consistency of the test that can be obtained
using test-retest method and equivalent-forms
method. To estimate reliability using the test-
retest method, the same test is administered
twice to the same group of test takers in two
different time periods
• The resulting test scores are correlated, and this
correlation coefficient provides a measure of
stability indicating how stable the test results
are over the given period of time.
Test retest
• Giving the same test twice provides two
scores for each individual tested. The
correlation between two sets of scores
yields a test-retest reliability or
examinee reliability. A coefficient of
examinee reliability indicates how
consistently examinees perform on the
same set of tasks.
• if two or more parallel forms of a test have
been produced in such a way that the scores
on these alternate forms will be equivalent
and if each student in the group is given both
forms of the test, then the correlation
between scores on the two forms provides an
estimate of reliability.
Equivalent forms
Reader reliability
• Essay tests, whose scores depend appreciably on the expert
judgment of a reader, are sometimes scored independently by
two or more readers. The correlation between, or among, the
multiple sets of ratings for a single set of student examination
papers provides a measure of the reliability with which the papers
were read
• A coefficient of reader reliability simply indicates how closely two
or more readers agree in rating the same set of examination
papers.
• A coefficient of test reliability, on the other hand, indicates how
similarly the examinees perform on different, but supposedly
equivalent , tasks.
4ESTABLISHING_TEST_RELIABILITY.pptx;filename= UTF-8''4ESTABLISHING TEST RELIABILITY.pptx
4ESTABLISHING_TEST_RELIABILITY.pptx;filename= UTF-8''4ESTABLISHING TEST RELIABILITY.pptx
4ESTABLISHING_TEST_RELIABILITY.pptx;filename= UTF-8''4ESTABLISHING TEST RELIABILITY.pptx

4ESTABLISHING_TEST_RELIABILITY.pptx;filename= UTF-8''4ESTABLISHING TEST RELIABILITY.pptx

  • 1.
    • Reliability isa concept referring to the consistency of the results of the test. Thus, if the same test is administered to the same test- takers in different time, the results will remain the same as the previous one. • The issue of reliability of a test may best be addressed by considering a number of factors that may contribute to the unreliability of a test, such as fluctuations: (1) in the student (student-related reliability) (2) in scoring (rater reliability) (3) in test administration (test administration reliability), and (4) in the test (test reliability) Reliability
  • 2.
    (1) Student-related reliability: temporaryillness, fatigue, a bad day, anxiety and other physical or psychological factors may make an observed score deviate from one’s “true” score. Also a test- taker’s “test-wiseness” or strategies for efficient test taking can also be included in this category. (2) Rater reliability: Human error, subjectivity, lack of attention to scoring criteria, inexperience, inattention or even preconceived biases may enter into scoring process. Inter-rater unreliability occurs when two or more scores yield inconsistent scores of the same test. Intra-rater unreliability is a common occurence for classroom teachers because of unclear scoring criteria, fatigue, bias toward particular “good” and “bad” students, or simple carelessness. One solution to such intra-rater unreliability is to read through about half of the tests before rendering any final scores or grades, then to recycle back through the whole set of tests to ensure an even- handed judgement. The careful specification of an analytical scoring instrument can increase rater- reliability.
  • 3.
    (3) Test AdministrationReliability: Unreliability may also result from the conditions in which the test is administered. Samples: street noise, photocopying variations, poor light, variations in temperature, conditions of desk and chairs. (4) Test unreliability: sometimes the nature of the test itself can cause measurement errors. - timed test may discriminate against students who do not perform well on a test with a time limit. - poorly written test items (that are ambiguous or that have more than one correct answer) may be a further source of test unreliability.
  • 4.
    • An observedscore on any test was a composite of two components: a true and an error component: X=T+E. If we asked an individual to take a test many times, the observed score would be slightly different each time. Spearman assumed that all error was random. That is, the observed score was only an estimate of the true score because random influence changed the observed score (fluctuations in environment (changes in test administration), psychological changes (tiredness) --- (Fulcher & Davidson, 2003) • Methods of computing reliability therefore try to estimate the extent to which an observed score in near to a true score, for it is only if the observed score is a good estimate of a true score that we can draw sound inferences from the score to construct.
  • 5.
    METHODS OF TESTRELIABILITY
  • 6.
    • Reliability canbe established internally or externally indicated by statistical figure known as correlation coefficient index, ranging from 0.00 to 1.00 (Weir, 1990). ESTABLISHING RELIABILITY
  • 7.
    • In practice,most tests report measures of internal consistency as reliability coefficients. These measures are simply a measure of mean inter-item correlation, or how well items correlate with each other. • However, internal reliability coefficients are also affected by other factors: ■ Number of items on the test: increasing the number of items will increase reliability. ■ Variation in item difficulty: items should be of equal difficulty to increase reliability, while if items have a range of facility values, reliability will decrease. ■ Dispersion of scores: If the sample upon which the test is field-tested is homogeneous (there is not a spread of scores), reliability will decrease. ■ Level of item difficulty: items with facility values of 0.5 maximize item variance, and so increase test reliability.
  • 8.
    INTERNAL RELIABILITY • Internalreliability is measuring the internal consistency of the test by way of giving test once. • The internal reliability can be obtained either through split-half method (by applying Spearman-Brown formula) or Kuder- Richardson method (by applying Kuder- Richardson formula K-R21 or K-R20).
  • 9.
    Split halves • Fromthe administration of a single test, half of the items are taken to represent one form of the test and correlated with the items in the other half of the test. The correlation coefficient is taken as a measure of reliability.
  • 10.
    Split-halves • A singletest was split into two reasonable equivalent halves. One common method of splitting a test has been to score the odd- numbered items and the even-numbered items separately. Then, the correlation between scores on the odd- and even-numbered items is calculated. To obtain an estimate of the reliability of the total test, use Spearman-Brown formula. rxx’ = n x r (n-1)r + 1 Where: rxx’ = full-test reliability r = correlation between two test parts n = number of times the test length is to be increased
  • 11.
    • 2 x.60 = 1.20 = .75 (2-1) .60 + 1 1.60
  • 12.
    • Kuder-Richardson Formula21: Perhaps the easiest reliability formula to calculate is KR-21 • Kuder-Richardson Formula 20 : KR-21 is a quick estimate of KR-20, and usually produces a value somewhat lower than KR-20. It can be significantly lower if small samples or small numbers of items are involved. USE OF K-R 20 OR 21 (test once)
  • 13.
    METODE KOEFISIEN ALFA •DIGUNAKAN UNTUK PENILAIAN BENTUK ESAI, TERMASUK TES MENGARANG, ATAU PENILAIAN DENGAN URUTAN SKALA. • MENGGUAKAN METODE KOEFISIEN ALPHA DARI CRONBACH, DENGAN MENGHITUNG VARIAN MASING MASING BUTIR SOAL ATAU (SD2) Dan VARIAN SELURUH SKOR (SDx2) • MISAL UTK MENULIS: ISI, ORGANISASI, BAHASA, KOSAKATA, EJAAN
  • 14.
  • 15.
    • External reliabilityis measuring the external consistency of the test that can be obtained using test-retest method and equivalent-forms method. To estimate reliability using the test- retest method, the same test is administered twice to the same group of test takers in two different time periods • The resulting test scores are correlated, and this correlation coefficient provides a measure of stability indicating how stable the test results are over the given period of time.
  • 16.
    Test retest • Givingthe same test twice provides two scores for each individual tested. The correlation between two sets of scores yields a test-retest reliability or examinee reliability. A coefficient of examinee reliability indicates how consistently examinees perform on the same set of tasks.
  • 17.
    • if twoor more parallel forms of a test have been produced in such a way that the scores on these alternate forms will be equivalent and if each student in the group is given both forms of the test, then the correlation between scores on the two forms provides an estimate of reliability. Equivalent forms
  • 18.
    Reader reliability • Essaytests, whose scores depend appreciably on the expert judgment of a reader, are sometimes scored independently by two or more readers. The correlation between, or among, the multiple sets of ratings for a single set of student examination papers provides a measure of the reliability with which the papers were read • A coefficient of reader reliability simply indicates how closely two or more readers agree in rating the same set of examination papers. • A coefficient of test reliability, on the other hand, indicates how similarly the examinees perform on different, but supposedly equivalent , tasks.