This document discusses the key characteristics of a good assessment instrument: practicality, reliability, validity, authenticity, and washback. It provides details on each characteristic, including definitions, types, and factors that can affect them. For example, it explains that a reliable instrument should provide consistent scores over time and discusses sources of unreliability like student factors or test administration issues. The document also lists different types of validity like face validity and content validity. Overall, it serves as a comprehensive overview of the essential features an effective assessment should possess.
4. The Indicators of an Effective Test
Brown (2004) mentions four indicators so that a test is
called as effective or practical. They are as follow:
1. Not excessively expensive
2. Administered within appropriate time constrains
3. Relatively easy to administer
4. Clear scoring/evaluation procedure
5. Another Indicators of an Effective Test
Considering Cohen’s explanation (2005) related to test
administration, a practical test should be in the following situation:
1. The room has adequate ventilation or heat, light, and acoustics.
2. Test administrator has good facility to administer the test.
3. The proctors of the test are able to build positive ambience
towards test takers’ affective domain.
4. There is considerably enough time to administer the test.
5. There is training for the proctors to administer the test.
6. The Description of Situation
• Regarding the position of English as Foreign Language
in Indonesia, there are several kinds of language test in
Indonesia. Let’s take a look at TOEFL, IELTS, and
English test in computer-based examination.
In addition to this situation, let’s see more institutional
test, e.g. placement test before enrolling an English
course, a daily English test, an mid-term test for English
subject, and a final semester test for English subject.
7. Let’s discuss some cases
1. From the cost point of view, arrange the test order from the most costly test
administration to the most economical test administration.
2. From the duration of the test, arrange the test order from the longest one
to the shortest one.
3. From the level of ease to administer and to complete, arrange the test
order from the most complicated one to the simplest one.
4. From the scoring/evaluating process, arrange the test order from the most
difficult till the easiest to assess.
9. RELIABILITY
• A good evaluation instrument must be reliable (Brown, 2004).
• A reliable evaluation instrument should be consistent and
dependable, meaning that the instrument yields similar score from
the same student (or group) for different times.
• Student A takes a test on Thursday, and he scores 80 out of 100 p
erfect score. The next day, he takes the same test and scores 80
too.
• However, it is difficult to get the perfect similarity of the score as
the human beings do not behave exactly the same way on every
occasion, even though the circumstances seem identical.
10. Factors Affecting the Reliability
• Student-related reliability
It is related to the students’ temporary illness, fatigue, a ‘bad day’, or other psychical or psychological
factors causing score deviation.
• Rater reliability:
Inter-rater reliability
Two raters or scorers should result in similar score for the same test for the same student.
However, sometimes, it could not be achieved as the scorers may not apply the similar standard
or there is human error, subjectivity or bias.
Intra-rater reliability
A good instrument should result in the same score each time the same scorer uses the
Instrument. However, unclear criteria, fatigue, bias of what is ‘good’ and ‘bad’ often cause in
unreliability.
• Test administration reliability
Unreliability may also result from the administration of the test such as the situation during the test, the
light, the written form of the test book, etc.
• Test reliability
Sometimes, the measurement error comes from the nature of the test itself such as too long test or
test with a very limited time.
12. VALIDITY
• Validity is “the extent to which inferences made from assessment
results are appropriate, meaningful, and useful in term of the purpose
of the assessment” (Gronlund, 1998, as cited in Brown, 2004).
• A valid reading test, for example, must actually measure the reading
ability of the test taker. A valid writing test have to measure the writing
ability.
• For example, Ask the students to write as much as words in 15 minutes
for writing test. It is practical in term of time and administration, and
also quite dependable in scoring, but it does not belong to a valid test
without any consideration of the comprehensibility, rhetorical discourse
elements, the organization of ideas, and any other factors of writing.
13. TYPES OF VALIDITY
FACE VALIDITY
CONSEQUENTIAL
VALIDITY
CONSTRUCT VALIDITY
CONTENT-RELATED
VALIDITY
CRITERION-RELATED
VALIDITY
15. DEFINITION
• Bachman and Palmer (1996, as cited in Brown, 2004) define authenticity as
“the degree of correspondence of the characteristic of a given language test
task to the features of a target language task”, and then suggest an agenda
for identifying those target language task and for transforming them into valid
test items.
• Mueller (2018) defines authentic assessment as a form of assessment in
which students are asked to perform real-world that demonstrates meaningful
application of essential knowledge and skill.
16. AUTHENTICITY MAY PRESENT IN THE FOLLOWING WAYS
The language in the test is as natural as possible
Items are contextualized rather than isolated.
Topics are meaningful (relevant, interesting) for
the learner.
Some thematic organization to items provided,
such as through story line or episode.
Tasks represent, or closely approximate, real-world tasks.
19. Brown (2003) proposes three interactive feedback or
comments:
1. Give praise for strengths—the “good stuff.”
2. Give constructive criticism of weakness.
3. Give strategic hints on how a student might improve
certain elements of performance.
20.
21.
22. REFERENCES
Bachman, Lyle F., Palmer, Adrian S. (1996). Language testing in practice: Designing and developing useful
language tests. New York: Oxford University Press.
Bailey, K. M. (1999). TOEFL: Monograph series (Washback in language testing). New Jersey: Educational
Testing Service.
Brown, H. D. (2004). Language assessment: principles and classroom practices. San Fransisco: Longman.
Cohen, A. D. (2001). Second language assessment. Teaching English as a second or foreign language 3rd Ed.
Boston: Heinie & Heinie/Thomson Learning.
Khairi, A. (2016). Negative Washback of national examination of Indonesia. Retrieved February 21, 2019, from
https://www.geges-ndl.com/2016/08/negative-washback-of-national.html
Mueller, J. (2018). Authentic assessment toolbox. Retrieved February 18, 2019, from
https://www.jfmueller.faculty.noctrl.edu
Phillips, D. (2001). Longman complete course for the TOEFL test. New York: Longman. Washback and
instructional planning. (2017). Retrieved February 21, 2019, from http://www.cal.org/flad/tutorial/impact/5
washbackinstruction.html