Supervised by Dr. Khaleel Al Bataineh
By Rami Alhawamdeh
Department of English Language and Translation
- Reliability refers to the consistency with which a test can be
scored, that is, consistency from person to person, time to time or
place to place .It means that tests are to be constructed,
administered and scored in such a way that the scores obtained on
a test on a particular occasion are likely to be very similar to those
which would have been obtained if it had been administered with
the same students with the same ability, but at a different time
(Hughes, 1991) .
*** There are two components of test reliability:
1- the reliability of the scores on the performance of candidates from
occasion to occasion, which can be ensured by the construction and
the administration.
2- the reliability of scoring
Reliability :
Reliability of scoring
-Reliability of scoring can be achieved more easily with objectively
scored tests
(e.g. tests of reading and listening comprehension), in which scoring
does not require the scorer’s personal judgement of the correctness,
because the test items can be marked on the basis of right or wrong.
-Scorer reliability is especially important in the case of subjectively
scored tests (i.e. tests of writing and speaking skills), because they
cannot be assessed on a right or wrong basis, assessment requires a
judgement on the part of the scorers.
There are two aspects of scorer
reliability: intra-rater reliability and inter-rater reliability:
1- intra-rater reliability is achieved if the same scorer gives
the same set of oral performances or written texts the same
scores on two different occasions. It can be measured by
means of a correlation coefficient.
2- inter-rater reliability refers to the degree of consistency of
scores given by two or more scorers to the same set of oral
performances or written texts.
The reliability of a test can be quantified in the
form of a reliability coefficient.
It can be worked out by comparing two sets of
test scores. These two sets can be obtained
by administering the same test to the same
group of test takers twice (test-retest method),
or by splitting the test into two equivalent
halves and giving separate scores for the two
halves, then correlating the scores (split half
method) The more similar are the two sets of
scores the more reliable is the test said to be.
(Alderson, 1995)
Key-note
The relationship of validity and reliability :
A test cannot be valid if it cannot provide consistently accurate
measurements.
It means that a valid test must be reliable, too. However, a reliable test
may not be valid. It depends partly of what exactly we want to measure.
For example multiple choice tests can be made highly reliable especially
if there are enough items, but performance on a multiple choice test
cannot be regarded a highly valid measure of one’s overall language
ability. There is always some tension between reliability and validity.
In order to maximise reliability it is often necessary to reduce validity.
An oral test may be a valid measure but performance on it may be
difficult to assess reliably. In practice there are only degrees of both that
testers want to achieve, shortly there is a trade-off between the two: one
is maximised at the expense of the other.
Source: Sárosdy, Bencze, Poór and Vadnay. 2006. Applied Linguistics I for BA Students in English.

Reliability

  • 1.
    Supervised by Dr.Khaleel Al Bataineh By Rami Alhawamdeh Department of English Language and Translation
  • 2.
    - Reliability refersto the consistency with which a test can be scored, that is, consistency from person to person, time to time or place to place .It means that tests are to be constructed, administered and scored in such a way that the scores obtained on a test on a particular occasion are likely to be very similar to those which would have been obtained if it had been administered with the same students with the same ability, but at a different time (Hughes, 1991) . *** There are two components of test reliability: 1- the reliability of the scores on the performance of candidates from occasion to occasion, which can be ensured by the construction and the administration. 2- the reliability of scoring Reliability :
  • 3.
    Reliability of scoring -Reliabilityof scoring can be achieved more easily with objectively scored tests (e.g. tests of reading and listening comprehension), in which scoring does not require the scorer’s personal judgement of the correctness, because the test items can be marked on the basis of right or wrong. -Scorer reliability is especially important in the case of subjectively scored tests (i.e. tests of writing and speaking skills), because they cannot be assessed on a right or wrong basis, assessment requires a judgement on the part of the scorers.
  • 4.
    There are twoaspects of scorer reliability: intra-rater reliability and inter-rater reliability: 1- intra-rater reliability is achieved if the same scorer gives the same set of oral performances or written texts the same scores on two different occasions. It can be measured by means of a correlation coefficient. 2- inter-rater reliability refers to the degree of consistency of scores given by two or more scorers to the same set of oral performances or written texts.
  • 5.
    The reliability ofa test can be quantified in the form of a reliability coefficient. It can be worked out by comparing two sets of test scores. These two sets can be obtained by administering the same test to the same group of test takers twice (test-retest method), or by splitting the test into two equivalent halves and giving separate scores for the two halves, then correlating the scores (split half method) The more similar are the two sets of scores the more reliable is the test said to be. (Alderson, 1995) Key-note
  • 6.
    The relationship ofvalidity and reliability : A test cannot be valid if it cannot provide consistently accurate measurements. It means that a valid test must be reliable, too. However, a reliable test may not be valid. It depends partly of what exactly we want to measure. For example multiple choice tests can be made highly reliable especially if there are enough items, but performance on a multiple choice test cannot be regarded a highly valid measure of one’s overall language ability. There is always some tension between reliability and validity. In order to maximise reliability it is often necessary to reduce validity. An oral test may be a valid measure but performance on it may be difficult to assess reliably. In practice there are only degrees of both that testers want to achieve, shortly there is a trade-off between the two: one is maximised at the expense of the other.
  • 7.
    Source: Sárosdy, Bencze,Poór and Vadnay. 2006. Applied Linguistics I for BA Students in English.