2. What is a test?
A method of measuring a person’s ability or knowledge in a given domain.
A test is a set of techniques, procedures, items= instrument
Well-constructed test-provides an accurate measure of someone’s ability.
Kind of simple but constructing a test is a complex task involving both science
and art.
How do you know if this instrument is “good” or not?
Is it administrable within given constraints?
Is it dependable?
Does it accurately measure what you want it to measure?
Five cardinal criteria for “testing a test”
3. Some criteria for testing a test
Practicality
Reliability
Validity
Authenticity
Washback
4. Practicality
It is not excessively expensive
Stays within appropriate time constraints (5 hours-impractical. It consumes more time
and money than necessary to accomplish its objective.
It is relatively easy to administer (individual one-on-one proctoring is impractical).
Example: A test that takes a few minutes for a student to take and several hours for the
examiner to evaluate is impractical.
It has a scoring/evaluation procedure that is specific and time-efficient (A test that
requires a computer to score is impractical for a person who does not have a computer
or who has to travel to get one).
In classroom-based testing, time is almost always a crucial practicality factor for busy
teachers with too few hours in the day.
5. Reliability
Consistent and dependable (If you give the same test to the same student or matched students on
different occasions, the test should yield similar results).
The issue of reliability of a test may best be addressed by considering a number of factors that may
contribute to the unreliability of a test.
Student-related reliability
Temporary illness, fatigue, anxiety, bad day which is going to result in a non-true score.
Rater-reliability: Human error, subjectivity may enter into the scoring process.
Inter-rater reliability: It occurs when two or more scorers yield inconsistent scores of the same
test. Possible for luck of attention to scoring criteria, inexperience, inattentions.
Intra-rater reliability: It occurs because of nuclear scoring criteria, fatigue, bias toward particular
“good” and “bad” students, or simple carelessness.
It could be possibly avoid by using rubrics.
6. Test administration reliability:
Unreliability may also result from the conditions in which
test is administer.
A listening test with noise outside-unreliable, photocopying
variations, the amount of light in different parts of the
variations in temperature and even conditions of desks and
chairs(factors beyond the control).
7. Test reliability:
The nature of the test itself can cause measurements errors.
the test is too long, test taker may become fatigue.
Timed-test may affects students who are not used to
under pressure.
Ambiguity in questions.
8. Validity
The degree to which a test measures what it is intended to
measure.
Reading comprehension test ( not prior knowledge)
Writing ( write as many words as Ss can in 15 minutes-not a
valid test to measure the whole writing competency that a
person has).
Questions in test- objective-criteria (subject itself)
9. Content Validity: content tend to test the ability, competence, the knowledge the Ss
have.
Proficiency in English with a pencil-paper test, it lacks content validity.
Speak in sort of authentic context – it has content Validity.
Construct Validity: speaking; pronunciation, fluency, intonation, accuracy, rhythm, pitch...
Consequential Validity: encompasses all the consequences of a test including such
considerations as its accuracy in measuring intended criteria, its impact on the
preparation of test-takers, its effects on the learner and the (intended and unintended)
social consequences of a test’s interpretation and use.
Face Validity: appear to be measuring what it is supposed to measure at the eyes of the
students.
10. Authenticity
Tasks in a test should be enacted in the real world.
Authenticity in a test may be presented in the following ways:
o The language is as natural as posible
o Items are contextualized rather than isolated
o Topics are meaningful (relevant, interesting )for the learner
o Some thematic organization to items is provided
o Tasks represent, or closely approximate, real-world task.
11.
12. Washback effect
Consequential validity: “The effect of testing on teaching and learning” Washback.
Backwash
It refers to the impact a test(a test result) can have on students, teachers, and educational process.
The information “washes back” to students in the form of useful DIAGNOSES of strengths and
weaknesses.
What happens if a teacher just provides a number or letter in the test result? There is no washback.
The challenge to teachers is to create classroom test that serve as learning devices. For example,
incorrect responses can become windows of insight into further work, while correct responses need to
praised, especially when they represent accomplishment in a student’s inter-language.
Washback enhances a number of basic principles of language acquisition: instrinsic motivation,
autonomy, self-confidence, among others.