This document discusses the criteria for effective language tests: practicality, reliability, validity, authenticity, and washback. It provides examples of how tests can violate these criteria. For reliability, it discusses factors like rater reliability and test administration reliability. It distinguishes reliability from validity. For validity, it discusses how tests can violate content validity, criterion validity, and construct validity. It states authenticity is important so language tests reflect real-world language use. It defines washback as the effect of tests on teaching and learning, and says washback can be positive or negative depending on the test. The document instructs groups to analyze examples of tests violating reliability, validity or authenticity criteria.
1. Language Testing for PSL
Meeting two: Group Discussion
Instruction: Read page 19-41 from Brown (2004), and then answer the following
discussion questions in groups.
An effective test meets the following five criteria: practicality, reliability, validity,
aunthenticity and washback. It is important for teachers to understand what each criterion
refers to and the role of each criterion in a test.
1. a. On page 20, there is an instance of an impractical test. There are four aspects of a
practical test; which aspect is violated in the instance?
When the red faced administrator and the proctor got together later to score the
test, they faced the problem of how to score the dictation- a more subjective
process than some other forms of assessment.
b. Think of other instance of an impractical test and then explain which aspect is
violated so that the test becomes impractical.
For Example: While the listening comprehension section of the test of was apparently
highly practical, the administrator had failed to check the materials ahead of time.
Then, they established a scoring procedure that did not fit into the time constraints.
In classroom-based tasting, time is almost a crucial practicality factor for busy
teachers with too few hours in the day.
2. a. A reliable test always gives (nearly) similar results. Even if a test taker took the test
at different times, he will receive more or less similar scores. Explain how each of
the factors below can affect the reliability of a test in your own words. Give an
example for point ii, iii, and iv.
i. Student-related reliability
According to the text I personally think that there are factors that contribute to
the unreliability of a test such as: Student-related reliability which can be caused
by fatigue, sickness, and anxiety.
ii. Rater reliability
In the story that I learn from the text about the placement test, I think the factors
of this is the initial scoring plan for the dictation was found to be unreliable that
is, the two scorers were not applying the same standards.
iii. Test administration reliability
In my opinion here are some of the factors that affecting the reliability: The
Length of the test, the length of the test affects the real values and the variances
of the observed values. The measurement errors are smaller in the measurement
values obtained from the long test than the short test.
iv. Test reliability
In here the factor is guessing in test gives rise to increased error variance and as
such reduces reliability. For example, in two-alternative response options there
is a 50% chance of answering the items correctly in terms of guessing.
b. What’s the difference between inter-rater reliability and intra-rater reliability?
2. The differences for the inter-rater reliability occur when two or mere scorers yield
consistent scorers of the same test, possibly for the lack of attention to scoring
criteria, inexperience, inattention, or even preconceived biases. While intra-rater
reliability is a common occurrence for classroom teacher because of unclear
scoring criteria, fatigue, bias toward, particular, “good” and “bad” students, or
simple carelessness.
3. a. Explain in your own words how reliability is different from validity.
In my opinion reliability and validity are concepts used to evaluate the quality of a
scoring test. They indicate how well a method, technique or test measures something.
Reliability is about the consistency of a measure, and validity is about the accuracy of
a measure.
b. In what way does a test violate content-validity? Give a real example.
I think in practice, content validity is often used to assess the validity of tests that
assess content knowledge. For example, Statistics Final Exam A final exam at the end
of a semester for a statistics course will have content validity if it covers every topic
discussed in the course and excludes all other irrelevant topics.
c. In what way does a test violate criterion related validity? How is concurrent
validity different from predictive validity?
-In my opinion the difference between concurrent validity and predictive validity rests
solely on the time at which the two measures are administered. Concurrent validity
applies to validation studies in which the two measures are administered at
approximately the same time.
d. In what way does a test violate construct validity? Explain in what way TOEFL
test violates construct validity according to Brown.
e. In what way does a test violate consequential validity? Give a real example.
f. In what way does a test violate face validity? What is the relation between face
validity and content validity?
4. a. Explain in your own words what it means by the authenticity of a test and why it is
important for a language test to be authentic.
b. Look at the examples of two multiple choice tasks on page 35 - 36. Explain why
Brown thinks that the first example is better than the second example.
5. a. What is washback? In large scale assessment washback generally refers to the
effect the tests have on instructions in terms of how students prepare for the test.
Is washback a good or bad in an assessment? It is good.
b. How is washback in the formative test different from the one in the summative test?
Formative test, by definition peroxide wash back in the form of information to the
learner on progress towards goals. While Summative test is provided assessment at
the end of course or program, do not need to offer much in the way of washback.
3. 6. Instruction: Each group must think of an instance of a test/test item(s) which
violates reliability, validity, and authenticity (one instance for each criterion) and
present their work to the other groups. In the presentation, the group must state
clearly in what why the test (or test item(s)) violates the specified criterion. If the
violation can be categorized into different criterions, state what they are and
explain each of them. The instance can be from a real test or made-up.