CHARACTERISTICS OF A GOOD
INSTRUMENT
ADI KURNIAWAN 16716251016 DWI PRIHATI 18716251037
WAHYU TEJO MULYO 16716251029 MUSFERA NARA VADIA 18716251035
PRACTICALITYA
B
C
D
RELIABILITY
VALIDITY
AUTHENTICITY
The
Characteristics
E WASHBACK
PRACTICALITY
The Indicators of an Effective Test
Brown (2004) mentions four indicators so that a test is
called as effective or practical. They are as follow:
1. Not excessively expensive
2. Administered within appropriate time constrains
3. Relatively easy to administer
4. Clear scoring/evaluation procedure
Another Indicators of an Effective Test
Considering Cohen’s explanation (2005) related to test
administration, a practical test should be in the following situation:
1. The room has adequate ventilation or heat, light, and acoustics.
2. Test administrator has good facility to administer the test.
3. The proctors of the test are able to build positive ambience
towards test takers’ affective domain.
4. There is considerably enough time to administer the test.
5. There is training for the proctors to administer the test.
The Description of Situation
• Regarding the position of English as Foreign Language
in Indonesia, there are several kinds of language test in
Indonesia. Let’s take a look at TOEFL, IELTS, and
English test in computer-based examination.
In addition to this situation, let’s see more institutional
test, e.g. placement test before enrolling an English
course, a daily English test, an mid-term test for English
subject, and a final semester test for English subject.
Let’s discuss some cases
1. From the cost point of view, arrange the test order from the most costly test
administration to the most economical test administration.
2. From the duration of the test, arrange the test order from the longest one
to the shortest one.
3. From the level of ease to administer and to complete, arrange the test
order from the most complicated one to the simplest one.
4. From the scoring/evaluating process, arrange the test order from the most
difficult till the easiest to assess.
RELIABILITY
RELIABILITY
• A good evaluation instrument must be reliable (Brown, 2004).
• A reliable evaluation instrument should be consistent and
dependable, meaning that the instrument yields similar score from
the same student (or group) for different times.
• Student A takes a test on Thursday, and he scores 80 out of 100 p
erfect score. The next day, he takes the same test and scores 80
too.
• However, it is difficult to get the perfect similarity of the score as
the human beings do not behave exactly the same way on every
occasion, even though the circumstances seem identical.
Factors Affecting the Reliability
• Student-related reliability
It is related to the students’ temporary illness, fatigue, a ‘bad day’, or other psychical or psychological
factors causing score deviation.
• Rater reliability:
Inter-rater reliability
Two raters or scorers should result in similar score for the same test for the same student.
However, sometimes, it could not be achieved as the scorers may not apply the similar standard
or there is human error, subjectivity or bias.
Intra-rater reliability
A good instrument should result in the same score each time the same scorer uses the
Instrument. However, unclear criteria, fatigue, bias of what is ‘good’ and ‘bad’ often cause in
unreliability.
• Test administration reliability
Unreliability may also result from the administration of the test such as the situation during the test, the
light, the written form of the test book, etc.
• Test reliability
Sometimes, the measurement error comes from the nature of the test itself such as too long test or
test with a very limited time.
VALIDITY
VALIDITY
• Validity is “the extent to which inferences made from assessment
results are appropriate, meaningful, and useful in term of the purpose
of the assessment” (Gronlund, 1998, as cited in Brown, 2004).
• A valid reading test, for example, must actually measure the reading
ability of the test taker. A valid writing test have to measure the writing
ability.
• For example, Ask the students to write as much as words in 15 minutes
for writing test. It is practical in term of time and administration, and
also quite dependable in scoring, but it does not belong to a valid test
without any consideration of the comprehensibility, rhetorical discourse
elements, the organization of ideas, and any other factors of writing.
TYPES OF VALIDITY
FACE VALIDITY
CONSEQUENTIAL
VALIDITY
CONSTRUCT VALIDITY
CONTENT-RELATED
VALIDITY
CRITERION-RELATED
VALIDITY
AUTHENTICITY
DEFINITION
• Bachman and Palmer (1996, as cited in Brown, 2004) define authenticity as
“the degree of correspondence of the characteristic of a given language test
task to the features of a target language task”, and then suggest an agenda
for identifying those target language task and for transforming them into valid
test items.
• Mueller (2018) defines authentic assessment as a form of assessment in
which students are asked to perform real-world that demonstrates meaningful
application of essential knowledge and skill.
AUTHENTICITY MAY PRESENT IN THE FOLLOWING WAYS
The language in the test is as natural as possible
Items are contextualized rather than isolated.
Topics are meaningful (relevant, interesting) for
the learner.
Some thematic organization to items provided,
such as through story line or episode.
Tasks represent, or closely approximate, real-world tasks.
WASHBACK
WASHBA
CK
Positive
Washbac
k
Negative
Washback
TYPES OF WASHBACK
Brown (2003) proposes three interactive feedback or
comments:
1. Give praise for strengths—the “good stuff.”
2. Give constructive criticism of weakness.
3. Give strategic hints on how a student might improve
certain elements of performance.
REFERENCES
Bachman, Lyle F., Palmer, Adrian S. (1996). Language testing in practice: Designing and developing useful
language tests. New York: Oxford University Press.
Bailey, K. M. (1999). TOEFL: Monograph series (Washback in language testing). New Jersey: Educational
Testing Service.
Brown, H. D. (2004). Language assessment: principles and classroom practices. San Fransisco: Longman.
Cohen, A. D. (2001). Second language assessment. Teaching English as a second or foreign language 3rd Ed.
Boston: Heinie & Heinie/Thomson Learning.
Khairi, A. (2016). Negative Washback of national examination of Indonesia. Retrieved February 21, 2019, from
https://www.geges-ndl.com/2016/08/negative-washback-of-national.html
Mueller, J. (2018). Authentic assessment toolbox. Retrieved February 18, 2019, from
https://www.jfmueller.faculty.noctrl.edu
Phillips, D. (2001). Longman complete course for the TOEFL test. New York: Longman. Washback and
instructional planning. (2017). Retrieved February 21, 2019, from http://www.cal.org/flad/tutorial/impact/5
washbackinstruction.html
THANK YOU

CHARACTERISTICS OF A GOOD INSTRUMENT

  • 1.
    CHARACTERISTICS OF AGOOD INSTRUMENT ADI KURNIAWAN 16716251016 DWI PRIHATI 18716251037 WAHYU TEJO MULYO 16716251029 MUSFERA NARA VADIA 18716251035
  • 2.
  • 3.
  • 4.
    The Indicators ofan Effective Test Brown (2004) mentions four indicators so that a test is called as effective or practical. They are as follow: 1. Not excessively expensive 2. Administered within appropriate time constrains 3. Relatively easy to administer 4. Clear scoring/evaluation procedure
  • 5.
    Another Indicators ofan Effective Test Considering Cohen’s explanation (2005) related to test administration, a practical test should be in the following situation: 1. The room has adequate ventilation or heat, light, and acoustics. 2. Test administrator has good facility to administer the test. 3. The proctors of the test are able to build positive ambience towards test takers’ affective domain. 4. There is considerably enough time to administer the test. 5. There is training for the proctors to administer the test.
  • 6.
    The Description ofSituation • Regarding the position of English as Foreign Language in Indonesia, there are several kinds of language test in Indonesia. Let’s take a look at TOEFL, IELTS, and English test in computer-based examination. In addition to this situation, let’s see more institutional test, e.g. placement test before enrolling an English course, a daily English test, an mid-term test for English subject, and a final semester test for English subject.
  • 7.
    Let’s discuss somecases 1. From the cost point of view, arrange the test order from the most costly test administration to the most economical test administration. 2. From the duration of the test, arrange the test order from the longest one to the shortest one. 3. From the level of ease to administer and to complete, arrange the test order from the most complicated one to the simplest one. 4. From the scoring/evaluating process, arrange the test order from the most difficult till the easiest to assess.
  • 8.
  • 9.
    RELIABILITY • A goodevaluation instrument must be reliable (Brown, 2004). • A reliable evaluation instrument should be consistent and dependable, meaning that the instrument yields similar score from the same student (or group) for different times. • Student A takes a test on Thursday, and he scores 80 out of 100 p erfect score. The next day, he takes the same test and scores 80 too. • However, it is difficult to get the perfect similarity of the score as the human beings do not behave exactly the same way on every occasion, even though the circumstances seem identical.
  • 10.
    Factors Affecting theReliability • Student-related reliability It is related to the students’ temporary illness, fatigue, a ‘bad day’, or other psychical or psychological factors causing score deviation. • Rater reliability: Inter-rater reliability Two raters or scorers should result in similar score for the same test for the same student. However, sometimes, it could not be achieved as the scorers may not apply the similar standard or there is human error, subjectivity or bias. Intra-rater reliability A good instrument should result in the same score each time the same scorer uses the Instrument. However, unclear criteria, fatigue, bias of what is ‘good’ and ‘bad’ often cause in unreliability. • Test administration reliability Unreliability may also result from the administration of the test such as the situation during the test, the light, the written form of the test book, etc. • Test reliability Sometimes, the measurement error comes from the nature of the test itself such as too long test or test with a very limited time.
  • 11.
  • 12.
    VALIDITY • Validity is“the extent to which inferences made from assessment results are appropriate, meaningful, and useful in term of the purpose of the assessment” (Gronlund, 1998, as cited in Brown, 2004). • A valid reading test, for example, must actually measure the reading ability of the test taker. A valid writing test have to measure the writing ability. • For example, Ask the students to write as much as words in 15 minutes for writing test. It is practical in term of time and administration, and also quite dependable in scoring, but it does not belong to a valid test without any consideration of the comprehensibility, rhetorical discourse elements, the organization of ideas, and any other factors of writing.
  • 13.
    TYPES OF VALIDITY FACEVALIDITY CONSEQUENTIAL VALIDITY CONSTRUCT VALIDITY CONTENT-RELATED VALIDITY CRITERION-RELATED VALIDITY
  • 14.
  • 15.
    DEFINITION • Bachman andPalmer (1996, as cited in Brown, 2004) define authenticity as “the degree of correspondence of the characteristic of a given language test task to the features of a target language task”, and then suggest an agenda for identifying those target language task and for transforming them into valid test items. • Mueller (2018) defines authentic assessment as a form of assessment in which students are asked to perform real-world that demonstrates meaningful application of essential knowledge and skill.
  • 16.
    AUTHENTICITY MAY PRESENTIN THE FOLLOWING WAYS The language in the test is as natural as possible Items are contextualized rather than isolated. Topics are meaningful (relevant, interesting) for the learner. Some thematic organization to items provided, such as through story line or episode. Tasks represent, or closely approximate, real-world tasks.
  • 17.
  • 18.
  • 19.
    Brown (2003) proposesthree interactive feedback or comments: 1. Give praise for strengths—the “good stuff.” 2. Give constructive criticism of weakness. 3. Give strategic hints on how a student might improve certain elements of performance.
  • 22.
    REFERENCES Bachman, Lyle F.,Palmer, Adrian S. (1996). Language testing in practice: Designing and developing useful language tests. New York: Oxford University Press. Bailey, K. M. (1999). TOEFL: Monograph series (Washback in language testing). New Jersey: Educational Testing Service. Brown, H. D. (2004). Language assessment: principles and classroom practices. San Fransisco: Longman. Cohen, A. D. (2001). Second language assessment. Teaching English as a second or foreign language 3rd Ed. Boston: Heinie & Heinie/Thomson Learning. Khairi, A. (2016). Negative Washback of national examination of Indonesia. Retrieved February 21, 2019, from https://www.geges-ndl.com/2016/08/negative-washback-of-national.html Mueller, J. (2018). Authentic assessment toolbox. Retrieved February 18, 2019, from https://www.jfmueller.faculty.noctrl.edu Phillips, D. (2001). Longman complete course for the TOEFL test. New York: Longman. Washback and instructional planning. (2017). Retrieved February 21, 2019, from http://www.cal.org/flad/tutorial/impact/5 washbackinstruction.html
  • 23.