Reliability of test

Reliability of Test
Dr. Sarat Kumar Rout
Assist. Prof. Department of Education
Ravenshaw University
Email:saratrout2007@rediffmail.com

Meaning of Reliability
• It refers to the precision or accuracy of the measurement of
score.
• Reliability refers to the stability of a test measure.
• Reliability is the degree to which a Practice, Procedure, or
Test (PPT) produces stable and consistent results if
repeated/re-examined on same individuals/students on
different occasions, or with different sets of equivalent items
when all other factors are held constant.

Meaning of Reliability
• Reliability is one of the important
characteristics of a good test.
• (explanation and generalization of results)
Example of tests:
Achievement test;
Intelligence test;
Creativity test; and
Personality test…..etc

Logical Meaning of Reliability of a Test
• Whenever we measure something (attribute or trait)
either in the Physical or Social science, the
measurement involves some kind of error.
(Sources of error – observers/scoring, instruments,
instability of the attribute, guessing…..etc)
• In other way, we can say the extent to which the
Practice, Procedure, or Test (PPT) free from
error(random/measurement and systematic) in any
measurement (Physical science or Social science).

• In terms of an equation, it can be written as:
XT = X∞ +Xe
XT = the actual obtained score
X∞ = true score
Xe= error score

• Whenever we administer a test to examinees,
we would like to know how much of their
scores reflects "truth" and how much reflects
“error”.
• It is a measure of reliability that provides us
with an estimate of the ‘proportion of
variability in examinees obtained scores that
is due to true differences among examinees
on the attribute(s) measured by the test.

• When measurement will be free from error,
the reliability will be perfect and reliability
index will be +1.00.
• But reliability is never perfect.

• Since any obtained score is divided into the true
score plus error score, the total variance of a test
is also divided into two components-true
variance and error variance.
• Variance= square of the standard deviation
• In terms of equation, it may be written as:
σ2
T =σ2∞ +σ2
e
σ2
T = total score variance
σ2∞ = true score variance
σ2
e= error score variance

• Thus the variance of total score is equal to the
variance of true score + the variance of error score.
• In classical test theory, the reliability of a test scores
is logically defined as:
“proportion of the true variance”
• The proportion of the true variance and the error
variance are found by dividing the total variance
• The proportion of the true variance= σ2∞ ⁄σ2
T
• The proportion of the error variance= σ2
e ⁄σ2
T

• Now, reliability coefficient rtt= σ2∞ ⁄σ2
T ………..(i)
or
• reliability coefficient rtt= 1- σ2
e ⁄σ2
T……………………. (ii)
• Suppose a achievement test in mathematics is
administered on group of 50 students. The
hypothetical total score variance, true score
variance and error score variance are as follows:
• Total variance=58.36, true variance=43.19 and error
variance=15.17
• By equation (i)= σ2∞ ⁄σ2
T =43.9/58.36=0.74
• By equation (ii)=1- σ2
e ⁄σ2
T =1-15.17/58.36=1-0.26=0.74

What is reliability coefficient?
• Study Tip: Remember that, in contrast to other
correlation coefficients, the reliability
coefficient is never squared to interpret it but is
interpreted directly as a measure of true score
variability. A reliability coefficient of .89
means that 89% of variability in obtained
scores is true score variability.

•The reliability coefficient is symbolized with
the letter "r" and a subscript that contains
two of the same letters or numbers (e.g.,
''rtt'').
• The subscript indicates that the correlation
coefficient was calculated by correlating a test
with itself rather than with some other
measure.

• Most methods for estimating reliability produce a
reliability coefficient, which is a correlation co-
efficient that ranges in value from 0.0 to + 1.0.
• When a test's reliability coefficient is 0.0, this
means that all variability in obtained test scores is
due to measurement error.
• Conversely, when a test's reliability coefficient is +
1.0, this indicates that all variability in scores
reflects true score variability.

Taken from page 3-3 of the U.S. Department of Labor’s “Testing and Assessment:
An Employer’s Guide to Good Practices” (2000).
http://www.onetcenter.org/dl_files/empTestAsse.pdf

• Note that a reliability coefficient does not provide
any information about what is actually being
measured by a test?
• A reliability coefficient only indicates whether the
attribute measured by the test— whatever it is—
is being assessed in a consistent, precise way.
• Whether the test is actually assessing what it was
designed to measure is addressed by an analysis
of the test's validity.

Methods of Estimating Reliability Coefficient
•A test's true score variance is not known,
however, and reliability must be estimated rather
than calculated directly.
•There are several ways to estimate a test's
reliability coefficient index.
1.Test-Retest Reliability
2.Alternate Forms Reliability
3.Internal Consistency Reliability
•Each involves assessing the consistency of an
examinee's scores over time, across different
content samples, or across different scorers.

Methods for Estimating Reliability
• The common assumption for each of these reliability
techniques that consistent variability is true score
variability, while variability that is inconsistent reflects
random error.
• The selection of a method for estimating reliability
depends on the nature of the test.
• Each method not only entails different procedures but
is also affected by different sources of error. For many
tests, more than one method should be used.

1. Test-Retest Reliability
• The test-retest method for estimating reliability
involves administering the same test to the same
group of examinees on two different occasions
and then correlating the two sets of scores.
• When using this method, the reliability
coefficient indicates the degree of stability
(consistency) of examinees' scores over time and
is also known as the coefficient of stability.

Test-Retest Reliability
• The primary sources of measurement error for
test-retest reliability are any random factors
related to the time that passes between the
two administrations of the test.
• These time sampling factors include random
fluctuations in examinees over time (e.g.,
changes in anxiety or motivation) and random
variations in the testing situation.

Test-Retest Reliability
•Memory and practice also contribute to error when they
have random carryover effects; i.e., when they affect
many or all examinees but not in the same way.
Despite all these limitations
•Test-retest reliability is appropriate to measure attributes
that are relatively stable over time.
(Aptitude, Achievement–speed and power test)
•Test-retest reliability is also appropriate to measure
Heterogeneous test.

2. Alternate (Equivalent, Parallel) Forms Reliability
• To assess a test's alternate forms reliability, two
equivalent forms of the test are administered to the
same group of examinees and the two sets of scores
are correlated.
• Alternate forms reliability indicates the consistency
of responding to different item samples (the two
test forms) and, when the forms are administered at
different times, the consistency of responding over
time.

Alternate (Equivalent, Parallel) Forms Reliability
• The alternate forms reliability coefficient is also
called the coefficient of equivalence when the two
forms are administered at about the same time.
• The primary source of measurement error for
alternate forms reliability is content sampling, time
sampling or error introduced by an interaction
between different examinees' knowledge and the
different content assessed by the items included in
the two forms (eg: Form A and Form B)

•The items in Form A might be a better match of one
examinee's knowledge than items in Form B, while
the opposite is true for another examinee.
•In this situation, the two scores obtained by each
examinee will differ, which will lower the alternate
forms reliability coefficient.
•When administration of the two forms is separated
by a period of time, time sampling factors also
contribute to error.

• Like test-retest reliability, alternate forms reliability is
not appropriate when the attribute measured by the
test is likely to fluctuate over time or when scores are
likely to be affected by repeated measurement.
• If the same strategies required to solve problems on
Form A are used to solve problems on Form B, even if
the problems on the two forms are not identical,
there are likely to be practice effects.

• When these effects differ for different examinees
(i.e., are random), practice will serve as a source of
measurement error.
• Although alternate forms reliability is considered by
some experts as the most rigorous method for
estimating reliability, it is not often assessed due to
the difficulty in developing two forms of the same
test that are truly equivalent. (Discuss criteria of
parallel test)

3. Internal Consistency Estimates of Reliability
• We have discussed that reliability estimates can be obtained
by administering the same test to the same examinees and by
correlating the results: Test/Retest.
• We have also seen that reliability estimates can be obtained
by by administering two parallel or alternate forms of a test,
and then correlating those results: Parallel & Alternate Forms.
• In both of the above cases, the test constructer or researcher
must administer two exams, and they are sometimes given at
different times to reduce the carry over effects.
• Here we will see that it is also possible to obtained a reliability
estimate using only a single test.
• The most common way to obtained a reliability estimate using
a single test is through the split- half approach/method

Split-Half approach to Reliability
• When using the Split-half approach, one gives a
single test to a group of examinees.
• Later, the test is divided into two parts, which may be
considered to be alternate forms of one another.
• In fact, the split is not so arbitrary; an attempt should
be made to choose the two haves so that they are
parallel or essential equivalent i.e. odd-even
method .
• Then the reliability of the whole test is estimated by
using the Spearman Brown formula.

• Using the Spearman-Brown formula:
• Here we are assuming the two test halves (t and t’) are
parallel forms.
• Then the two halves are correlated, producing the estimated
Spearman-Brown formula reliability coefficient, rtt’.
• But this is only a measure of the reliability of one half
of the test.
• The reliability of the entire test would be greater than
the reliability of half test.
• The Spearman-Brown formula for estimating reliability of the
entire test/ whole test is therefore:
• Reliability coefficient= rh’. = 2x rtt’/ 1+ rtt’

Reliability coefficient of half test (rtt’) and
entire/whole test (rh)
0.00 0.00
0.20 0.33
0.40 0.57
0.60 0.75
0.80 0.89
1.00 1.00

ii. Cronbach’s coefficient α approach
• on the other hand, the two test halves may not
parallel forms.
• This is confirmed when it is determined that the two
halves have unequal variances.
• In these situations, it is best to use a different
approach to estimating reliability.
• Cronbach’s coefficient α
• α can be used to estimate the reliability of the entire test.

Cronbach’s coefficient α approach
Cronbach’s coefficient α= 2 [σh
2 –(σt1
2 – σt2
2 )]/σh
2
σh = variance of the entire test, h
σt1
= variance of the half test, t1
σt2
= variance of the half test, t2
• It is the case, that if the variances on both test
halves are equal, then the Spearman-Brown
formula and Cronbach’s α will produce
identical results.

• Content sampling is a source of error for both split-half
reliability and coefficient alpha.
• For split-half reliability, content sampling refers to the
error resulting from differences between the content of
the two halves of the test (i.e., the items included in
one half may better fit the knowledge of some
examinees than items in the other half).
• For coefficient alpha, content (item) sampling refers to
differences between individual test items rather than
between test halves.

iii. Kuder-Richardson Formulas-20 & 21
When test items are scored dichotomously
(right or wrong), a variation of coefficient
alpha known as the Kuder-Richardson
Formula 20 (KR-20) can be used.

Kuder-Richardson Formulas-20 & 21

Internal Consistency Reliability
•The methods for assessing internal consistency
reliability are useful when a test is designed to
measure a single characteristic, when the characteristic
measured by the test fluctuates over time, or when
scores are likely to be affected by repeated exposure to
the test.
•They are not appropriate for assessing the reliability of
speed tests because, for these tests, they tend to
produce spuriously high coefficients. (For speed tests,
alternate forms reliability is usually the best choice.)

Factors That Affect The Reliability Coefficient
The magnitude of the reliability coefficient is affected
not only by the sources of error discussed earlier, but
also by the length of the test, the range of the test
scores, and the probability that the correct response
to items can be selected by guessing.
– Test Length
– Range of Test Scores
– Guessing

1. Test Length
•The larger the sample of the attribute being
measured by a test, the less the relative effects of
measurement error and the more likely the sample
will provide dependable, consistent information.
•Consequently, a general rule is that the longer the
test, the larger the test's reliability coefficient.
•The Spearman-Brown prophecy formula is most
associated with split-half reliability but can actually
be used whenever a test developer wants to
estimate the effects of lengthening or shortening a
test on its reliability coefficient.

Test Length
For instance, if a 100-item test has a reliability
coefficient of .84, the Spearman-Brown formula
could be used to estimate the effects of increasing
the number of items to 150 or reducing the number
to 50.
A problem with the Spearman-Brown formula is that
it does not always yield an accurate estimate of
reliability: In general, it tends to overestimate a test's
true reliability (Gay, 1992).

Test Length
• This is most likely to be the case when the added
items do not measure the same content domain
as the original items and/or are more susceptible
to the effects of measurement error.
• Note that, when used to correct the split-half
reliability coefficient, the situation is more
complex, and this generalization does not always
apply: When the two halves are not equivalent in
terms of their means and standard deviations,
the Spearman-Brown formula may either over- or
underestimate the test's actual reliability.

2.Range of Test Scores
• Since the reliability coefficient is a correlation
coefficient, it is maximized when the range of
scores is unrestricted.
• When examinees are heterogeneous, the
range of scores is maximized.
• The range is also affected by the difficulty
level of the test items.

Range of Test Scores
• When all items are either very difficult or very
easy, all examinees will obtain either low or
high scores, resulting in a restricted range.
• Therefore, the best strategy is to choose items
so that the average difficulty level is in the
mid-range (r = .50).

3. Guessing
• A test's reliability coefficient is also affected by
the probability that examinees can guess the
correct answers to test items.
• As the probability of correctly guessing answers
increases, the reliability coefficient decreases.
• All other things being equal, a true/false test will
have a lower reliability coefficient than a four-
alternative multiple-choice test which, in turn,
will have a lower reliability coefficient than a free
recall test.

General points about reliability
• Any test is neither perfectly reliable nor perfectly
unreliable. The reliability is not an absolute
principle, rather it is always a matter of degree.
• Reliability is necessary but not a sufficient condition
for validity.
• The reliability is primarily statistical.

Why reliability is an important characteristics of a good test ?
No matter how well the objectives are written, or how clever the
items, the quality and usefulness of an examination is predicated
on Validity and Reliability.
• Without reliability and validity of one cannot test
hypothesis.
• Without testing hypothesis, one cannot support a
theory.
• Without a supported theory, one cannot explain why
events occur.
• Without adequate explanation, one cannot develop any
effective material or non-material technologies.

Reliability of test

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to Reliability of test

Similar to Reliability of test (20)

Recently uploaded

Recently uploaded (20)

Reliability of test