3. Reliability
Reliability
Reliability is the extent to which a test produces consistent scores at different administrations to the similar group
of examinees. Reliability is synonymous with Dependability, Stability, Consistency, Predictability and Accuracy.
= +
Accordingly, reliability is defined as the extent to which a test is error free. In fact, your
scores (Obtained Scores) are only a partial representation of your true score ( Real Ability)
and the reason lies in the presence of the factors other than the ability being tested.
Therefore, a reliable test is a test in which true score variance is higher and error score
variance is lower. If the test is error free, the true score will be equal to observed score.
Classical True Score Measurement
Obtained Score Real Ability Other Factors
X = Xt + Xe
X= Observed Score
Xt= True Score
Xe= Error Score
4. Reliability
1) It refers to psychological and physical factors including “bad day” anxiety, illness, test taker’s “test wiseness” and
fatigue which can make an “observed score” deviate from one’s true score.
A). Inter- Rater Reliability: Scorers yield inconsistent scores of the same test.
2) Rater reliability falls into 2 categories
B). Intra- Rater Reliability: Unclear scoring criteria, bias and carelessness.
3) It basically springs from the conditions in which the test is administered: noisy class, amount of light, chairs ..
4) The test should fit into the time constraints, the test should not be too long or short and test items should be clear.
4). Test Reliability
Reliability falls within 4 kinds
1). Student-related
Reliability
3). Test
Administration
Reliability
2). Rater Reliability
6. Validity
Validity
Validity is the degree of correspondence between the test content and the content of the material to be tested.
Ex: A valid test of Reading Ability actually measures the reading ability itself: not previous knowledge.
5 Ways to Establish Validity
1). Content Validity 2). Criterion Validity 3). Construct Validity
4). Consequential
Validity
5). Face Validity
7. Validity
Validity
1). Content Validity: If a test actually samples the subject matter about which conclusions are to be drawn, so it can claim
content-related evidence of validity. Ex
Direct Testing: It requires the test takers to perform the target task directly.
Indirect Testing: Learners are required to perform by the use of indirectly related tasks.
2). Criterion validity: Is the extent to which performance on a test is related to a criterion which is the indicator of the ability
being tested. The criterion may be individuals’ performance on another test or even a known standard.
Concurrent Validity: A test has CV if its results are supported by other concurrent performance beyond the assessment itself.
Predictive Validity: It tends to predict a student’s likelihood of future success
3). Construct Validity: Is the extent to which a test measures just the construct it is supposed to measure.
4) Consequential Validity: It refers to the positive or negative consequences of a particular test. Consequences include its
impact on the preparation of test takers, on learners, social consequences and washback as well.
5) Face Validity: It is the extent to which the measurement method “on its face” appears to measure the particular ability.
It is generally based on the subjective judgment of the examinees.
Speaking Multiple-choice
Oral production
TOEFL
Depression
8. Practicality and Authenticity
P & A
Practicality: It is defined as the relationship between the resources that will be required in the design, development, and
use of the test and the resources that will be available for these activities. It is represented as following figure:
Brown (2004:19) defines practicality is in terms of:
1) Cost
2) Time
3) Administration
4) Scoring / Evaluation
Authenticity: Is the extent to which the tasks required on a given test are similar to normal “real life” language use, in
other words, it is the degree of correspondence between tests, tasks, and activities of target language use.
Therefore, the higher the correspondence, the more authentic the test.
Authenticity may be present in the following ways: 1.
2.
3.
4.
The language in the test is natural as possible.
Items are contextualized rather than isolated.
Topics are meaningful (relevant, interesting) to the learners.
Some thematic organization to items is provided, such as through a story or
episode.
5. Tasks represent, or closely approximate, real-world tasks.
9. Washback/Backwash
1) Negative Washback: When test and testing techniques are at variance with the objectives of the course. Tests which
have negative washback is considered to have negative influence on teaching and learning.
Ex: Taking an English course to be trained in 4 language skills, however the language test does not test those skills.
2) Positive Washback: Positive washback would result when a testing procedure encourages “good” teaching practices.
EX: The consequence of many reading comprehension tests is a possible development of the reading skills.
washback
Washback Effect: Generally, it is the influence of the nature of a test on teaching and learning.
2 kinds of washback
1). Negative Washback 2). Positive Washback