This document discusses the key characteristics of effective assessment: validity, reliability, practicality, and accuracy. It defines each characteristic and provides examples. Validity means a test measures what it intends to measure. Reliability means a test produces consistent results. Practicality means a test is usable in terms of time and cost. Accuracy means a test is free from errors. The document also discusses factors that affect the acceptability of a test like length, technique, administration conditions, and presentation quality. Overall, the document provides an overview of the essential features of assessment and testing.
2. Introduction
Assessment, if it is to adequately fulfill its purpose, should satisfy the requirements of four key
characteristics:
1- Validity
2- Reliability
3- Practicality
4- Accuracy
3. A test must achieve validity ,reliability,
and practicality
1- A test is valid if it measure what you want it to measure.
Tests should based on the contents of the textbook and the
methodological teaching approaches ,as well as measuring what is
supposed to measure .
2- Tests are reliable if their results are consistent .If administered
to the same students on another occasion ,they would obtain the
same results.
There are two main sources of reliability
A- The consistency of performance from candidates.
B- scoring
3-A test has practicality if it does not involve much time or money
4. Validity
It is the degree to which a test measures what it is supposed to measure, or
can be used successfully for the purpose for which it is intended. This
means you should be clear about what exactly you want to test.
Two questions must be considered:
1- What precisely does the test measure?
2- How well does it do it?
5. Types of Validity
1- Content Validity: is concerned with what is being tested. The most
important issues for a teacher when preparing a test are:
a) Test relevance: the relative importance of each area, and the number of
items given to it. It requires the specification of the language areas to be
tested, which in turn requires a careful analysis of the subject or skill, and
the design of the test.
b) Content coverage: is concerned with the extent to which the questions
adequately cover the language areas being studied.
6. For example:
If a test is measure students’ mastery of English grammar, the teacher has to
study the unit(s) and pick out the new structure points. He has then, to
plan how the students are to demonstrate mastery of these grammar
points in behavioral terms. This concerns the design of the test and
whether it tests recognition or production or both.
7. Types of Validity
2- Empirical Validity: is usually referred to as statistical validity. If we are to check
the effectiveness of a test and to determine how well the test measures, we should
compare the test results with the results of some independent outside criteria that
we believe is an indicator of the ability tested. If there is high correlation between
the two, then, our test has empirical validity.
Example of independent criteria:
* Scores given at the end of the course.
* The teacher’s judgment of his students.
* External examination.
8. Types of Validity
3- Face Validity: It is argued that test appearance has a considerable effect on its
acceptability to both test takers and test users. Face validity has to do with surface
credibility or public acceptability of a test.
4- Construct Validity: It means that the testing methods should be in harmony with
teaching method used, and the theory upon which the instructional materials have
been prepared and developed.
For example: when a course of study emphasizes the communicative aspect of the
language, and the test is designed according to discrete point items. The construct
validity of this test will certainly be low.
9. It means the stability of test scores. Presumably, if the same test is given
twice to the same group of students, under the same conditions, it would give
the same results.
Reliability
Requisites of
Dependable Tests
Standard Tasks
Standard Scoring
Multiple Sample
Standard Conditions
10. 1- Multiple Sample: the more samples of students’ performance we take,
the more reliable will be our assessment of their knowledge and ability. It
should contain a wide variety of levels of difficulty.
2- Standard conditions: the reliability of the test scores can also be assured
if all students take the test under identical conditions. In a listening test,
for example, all students must be able to hear the items clearly.
3- Standard tasks: all students must be given the same items or items of
equal difficulty. In other words, the test must be identical so as the
format.
4- Standard scoring: all the papers of a test must be scored in an identical
manner. The teacher or the scorer should give the same or nearly the same
score repeatedly for the same test performance. Objective tests tend to be
more reliable than free-response tests like composition, where individual
judgement must be made.
11. Practicality
A third characteristic of assessment is its practicality or usability. A test which is valid and
reliable but difficult to administer, score, or require too much equipment or money may
fail to gain acceptance. The two factors that have to be considered to achieve
practicality of a test are:
1) Economy: the cost in time, money, and personnel of administrating a particular test.
2) Ease: the degree of difficulty experienced in the administrating and scoring of the test,
for example, an oral test that demands the use of a tape recorder is not practical if it has
to be administered to thousands of students.
12. Accuracy
Accuracy means that,
a) The test should be free from grammatical, spelling, and punctuation errors, frequently
found in EFL test papers. The teacher should choose the test material from reliable
sources like books.
b) The numbering of questions, sub-questions, and items should be correct.
c) The directions for each question should be accurately and concisely worded, with the
marks allotted for it as well as the time allotted for the whole test.
For instance, if the teacher wants his students to write the correct word chosen from
three or four choices (Multiple-choice questions), which should be more accurate to
write: write the correct answer or choose the correct option?
13.
14. Factors which affect a test and make it unacceptable
1- Length of test – a short test is likely to be less valid and less reliable
than a long test. On other hand, a very long test also lacks these two
criteria as it will be tiring and scores will be distorted.
2- Choice of test technique -- each technique has its drawback, for
instance, a composition test is less reliable than a multiple choice test.
3- Writing the test -- vaguely worded questions, difficult vocabulary or
complicated structures certainly affect reliability and validity.
4- Test administration – not adequate time given to the majority of the
students to finish the test may affect their answers and scores. In
addition, inadequate spacing, lighting, heating or distractions may also
affect their test results.
5- The question paper -- The question should be typed and not written by
hand, with ample spaces among the questions. The writing should be free
from careless errors. The directions should be clear and marks for each
question are given.