Principles of Language Testing Explained

Principles of Testing
1. Practicality 2.Reliability 3. Validity 4.Backwash 5.Authenticity Practicality
An effective test is:
i) cost effective ii) time effective iii) easy to administer iv) Scoring and evaluation process is
specific and time-efficient.
Are the following tests practical?
1. A speaking test for 250 students in which only 5 assessors and interlocutors are available.
A listening test takes half an hour to administer and 15 minutes to assess.
An English language test which consists of 100 multiple choice questions which can only be
assessed on computers.
Reliability
i)A reliable test is consistent and dependable. ii)(The consistency of test results over
repeated administrations )
If a student achieves similar scores when a test is administered at different time and
different places that test is a reliable test.
1. Student Related Reliability
Students related factors are likely to interfere with students` true scores. For example,
temporary illness, fatigue, bad day, anxiety, physical or psychological factors as well a test-
takers` test-wiseness or strategies for efficient test taking.
2.Rater Reliability.
It occurs due to human subjectivity and biases when scoring tests.
i)Inter-rater reliability
It occurs when two or more scorers yield inconsistent scores of the same test due to the lack
of attention to scoring criteria, inexperience, inattention or even preconceived biases.
Types of Reliability
iii) Test Administration Reliability
Unreliability may also result from the conditions in which the test is administered. For
example, noise, light or temperature in the classroom.
iv) Test Reliability
Sometimes the nature of the test itself can cause measurement errors. For example, a long
test in which test takers become fatigued towards the end of the test and give incorrect
answers or ambiguous test items with more than one correct answers.
How to Make Tests More Reliable p.21-24 (notes)
1) Take samples of behaviour (the more items you have on test, the more reliable the test
will be) 2) Do not allow candidates too much freedom. 3) Write unambiguous items. 4)
Provide clear and explicit instructions 5) Ensure that tests are well laid out and perfectly
legible. 6) Candidates should be familiar with format and testing techniques. 7) Provide
uniform and non-distracting conditions of administration. 8) Use items that permit scoring
which is as objective as possible. 9) Make comparisons between candidates as direct as
possible. 10) Provide a detailed scoring key. 11) Train scorers. 12) Agree acceptable
responses and appropriate scores at outset of scoring 13) Identify candidates by number,
not name. 14) Employ, multiple, independent scoring.
Validity A test is said to be valid if it measures accurately what it is intended to measure.
Some aspects of the concept of validity
i) Content validity
A test is said to have content validity if its content constitutes a representative sample of the
language skills, structures, etc. with which it is meant to be concerned.

How can the content validity of a test is ensured?
Comparing the test content to test specification to ensure that the test contains the a proper
sampling of relevant skills, or knowledge to be tested.
Lack of content validity is likely to have a negative backwash effect on teaching as the areas
which are not tested are usually ignored in teaching.
ii) Criterion related validity
How far results on the test agree with those provided by some independent and highly
dependable assessment of the candidate’s ability. This independent assessment is thus the
criterion measure against which the test is validated.
A) Concurrent validity
If a the results of a test are supported by other concurrent performance beyond the
assessment. For example, the validity of a high score on the final exam of a foreign language
course will be substantiated by actual proficiency in the language.
B) The Predictive Validity
The assessment criterion is to measure a test taker’s likelihood of future success. For
example, placement tests, admissions assessment batteries, language aptitude tests.
iii) Construct –Related Evidence
Construct: Any theory, hypothesis, or model that attempts to explain, observed
phenomenon in our universe of perceptions. For example, `Proficiency` and `communicative
competence` are linguistic constructs. (Difficult to measure directly, inferential data is
required) It is often difficult to measure constructs directly. Language learning and teaching
involves theoretical constructs.
Construct validity asks ` Does this test actually measure the theoretical constructs as it has
been defined?
****For example: The major components of oral proficiency are pronunciation, fluency,
grammatical accuracy, vocabulary use, and sociolinguistic appropriateness. An oral interview
has construct validity only if it measures all of the referred components of the speech.
Or vocabulary test example on p.25
iv)Consequential Validity
Consequential validity is related to all the consequences of a test. For example, its accuracy
in measuring intended criteria, its impact on the preparation of test takers, its effects on the
learner and intended and unintended social consequences of a test’s interpretation and use.
For example, the effects of assessment on students` motivation, subsequent performance in
a course, independent learning, study habits and attitude toward school work.
v) Face Validity
Face validity refers to the degree to which a test looks right and appears to measure the
knowledge or ability it claims to measure, based on the subjective judgement of the
examinees who take it, the administrative personnel who decide on its use, and other
unsophisticated observers. P.26 & 27 `Does the test, on the face of it appear from the
learner’s perspective to test what it is designed to test? `
What are the features of a test with high face validity?
Refer to p. 27
Face validity can not be empirically tested by teachers or experts but it is in the eye of the
beholder. A test cannot be highly valid if it is unreliable due to measurement error. However,
a test can be reliable but not necessarily valid for the purposes it claims.
vi)Authenticity

Bachman and Palmer (1996, p.23) define authenticity as `the degree of correspondence of
the characteristics of a given language test task to the features of a target language task`.
Test items should simulate real world tasks (tasks that are likely to occur in real world)
The features of an authentic test
1. The language in the test is as authentic as possible
2. Items are contextualized rather than isolated
3. Topics are meaningful (relevant, interesting) for the learner
4. Some thematic organization to items is provided, such as through a story line or episode.
5. Tasks represent or closely approximate, real world tasks.
How can test authenticity be maintained?
1. Reading text is selected from the real-world sources.
2. Listening comprehension sections should feature natural language with hesitations, white
noise and interruptions.
3. Include episodic test items, sequenced to form meaningful units, paragraphs or stories
vii) Backwash
It’s a facet of consequential validity. It is the effect of testing on teaching and learning. (can
be beneficial or harmful) When a test is regarded important it dominates all learning and
teaching activities. If the test technique is at variance with the content and the objectives of
the course, it can have either a positive or negative effect on teaching and learning.
Explain why and how?
Tests are likely to dominate instruction. A test should be supportive good teaching and exert
a corrective feedback on bad teaching. Backwash is likely to enhance a number of basic
principles of language acquisition: intrinsic motivation, autonomy, self confidence, language
ego, Interlingua, and strategic investment, among others.
For achieving beneficial backwash see p.15
1. Test the abilities whose development you want to encourage. (Test oral ability to
encourage oral ability)
Something related to content validity. Teachers usually avoid to test abilities that are
difficult to test, expensive in terms of time and money.
2. Sample widely and unpredictably
A test can only measure a sample of language items and abilities included into specification.
However, the sample taken should represent as far as possible the full scope of what is
specified. If the sample is taken from a restricted area of specifications, it has a harmful
backwash effect on teaching.
3. Using direct testing.
Direct testing implies the testing of performance skills, with texts and tasks as authentic as
possible. If we test directly that we are interested in fostering, then practice for the test
represents practice in those skills.
4. Making testing criterion referenced
i)If test specifications make clear just what candidates have to be able to do, and with what
degree of success, then students will have a clear picture of what they have to achieve.
5. Students know that if they do perform the task s at the criteria level, then they will be
successful on the test, regardless of how other students perform.
6. Base achievement tests on objectives.
If achievement tests are based on objectives, rather than on detailed teaching and textbook
content they will provide a truer picture of what has actually been achieved.
7. Ensure test is known and understood by students and teachers.

The rationale for the test, its specifications, and sample items should be made available to
everyone. Concerned with the preparation for the test. Students should know what the test
demands from them.
8) Where necessary provide assistance to teachers.
The introduction of a new test may make demands on teachers. For example, a new national
test may attempt to assess communicative skills rather than vocabulary or grammatical
structures, teachers who are unfamiliar with communicative language teaching, they need
training.
9) Counting the cost
Test should be easy and cheap in terms of construction, administration and scoring.
For applying principles to the evaluation of classroom tests see pages31-38 (Book) & Chapter
1
Applying Principles to the Evaluation of Classroom Test
1. The classroom tests can be evaluated by considering five basic principles (Validity, etc……)
2. Validity is the most significant issue in testing. If a test lacks validity, then all the other
consideration would be useless.
3. Practicality comes next and followed by authenticity in significance.
Are the test procedures practical?
Practicality checklist
1. Are administrative details clearly identified before the test?
2. Can students complete the test reasonably within the set time frame?
3. Can the test be administered smoothly, without procedural glitches?
4. Are all materials and equipment ready?
5. Is the cost of the test within budgeted limits?
6. Is the scoring/evaluation system feasible in the teacher’s time frame?
7. Are methods for reporting results determined in advance?
Is the test reliable? (Related to teacher and test)
Reliability in terms of physical context
1. A clearly photocopied test sheet for every student
2. Sound amplification clearly audible to everyone
3. Video is visible to everyone
4. Equal lighting, temperature, extraneous noise and optimal classroom conditions for all
students.
5. Objective scoring procedures.
Intra ratter reliability is an important issue for classroom teachers.
Teachers need to find ways to maintain their concentration and endurance during the
scoring process.
For open ended questions, teachers:
1. T should use a consistent set of criteria for a correct response.
2. T should give utmost attention to the set of criteria throughout the evaluation time.
3. Read through the tests at least twice to check consistency.
4. If you modify the criteria in the course of the scoring process, go back and apply the same
standard to all.
5. Read the test by several sittings to avoid fatigue.
Does the procedure demonstrate content validity?
Content validity is the main source of validity in classroom tests.

`Content validity is basically related to the extent to which the test requires students to
perform tasks that were included in the previous classroom lessons and represents unit
objectives`
See example on p.32
How can the content validity of a test be evaluated?
1. Are classroom objectives clearly identified and appropriately framed? Check examples on
pages 32-33
An appropriate test would elicit an adequate number of samples of student performance,
have a clearly framed set of standards for evaluating the performance and provide some sort
of feedback to the students.
2.Are lesson objectives represented in the form of test specifications?
The content validity of a test should be represented in how the objectives of the unit are
represented in the form of the content of items, cluster of items, and item types.
Do you clearly perceive the performance of test-takers as reflective of the classroom
objectives?
Is the procedure face valid and `biased for best`?
Students usually judge a test to be face valid when:
1. Directions are clear.
2. The structure of the test is organized logically.
3. Its difficulty level is appropriately pitched.
4. The test has no surprises
5. Timing is appropriate
Face validity=Biased for best
A teacher should
1. Offer students adequate review and preparation for the test
2. Suggest beneficial strategies
3. Structure the test in such a way that best students are modestly challenged and the
weaker students are not overwhelmed.*For test taking strategies see pp. 34-35 *(Ts
strategic suggestions to optimize students` test performance)
Are the test tasks as authentic as possible?
Checklist
1. Is the language in the test as authentic (real-world language) possible?
2. Are items as contextualized as possible rather than isolated?
3. Are topics and situations interesting, enjoyable and/or humorous?
4. Is some thematic organization provided, such as through a story line or episode?
5. Do tasks represent, or closely approximate real world tasks? Does the test offer beneficial
backwash to the learner?
Checklist
1. Is the test content relevant to the curriculum and objectives?
2. How much time students spend to prepare for the test?
3. Does a test have positive consequences to the test takers in terms of learning outcomes
and teachers in terms of designing their future teaching?
Do the students use the feedback to improve their learning or do teachers benefit from the
test outcome to design their teaching to help students improve their learning outcomes?
If a test fails to measure accurately whatever it is intended to measure or if students real
knowledge and skills are not reflected in the test scores they obtain.
There are two main sources of inaccuracy:

1) The first reason is related to test content and technique (validity).
For example we can not assess students` abilities by means of multiple choice tests.
However accuracy is sacrificed for reasons of economy and convenience. (However, writing
good multiple choice tests is still very difficult as a lot of time and effort is needed)
Use appropriate test techniques.
2) Lack of reliability-if a test measures consistently, then it is an accurate test.
Unreliability is related to the features of the test itself and the way it is scored. For example,
unclear instructions, ambiguous questions, items that result in guessing on the part of the
test takers.
Considering principles of test construction when constructing tests is likely to minimize
factors that lead to inaccurate test construction.
Another reason is the accordance of significantly different scores
to equivalent test performances by two different assessors.
Inaccurate Tests

Principles of Language Testing Explained

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to Principles of Language Testing Explained

Similar to Principles of Language Testing Explained (20)

Recently uploaded

Recently uploaded (20)

Principles of Language Testing Explained