This model along with the three
principles, provides a basis for
answering this question:
“ How useful is this particular
test for its intended purpose(s)
It is the overall usefulness of the test
that is to be maximized, rather than
the individual qualities that affect
The individual test qualities cannot be
evaluated independently, but must be
evaluated in terms of their combined
effect on the overall usefulness of the
Test usefulness & the appropriate
balance among the different qualities
cannot be prescribed in general, but
must be determined for each specific
In order to be useful, any given
lg. test must be developed
with a specific purpose, a particular group
of test takers and a specific lg. use domain.
“ target lg. use” or TLU
*( tasks in the TLU domain
- Reliability is often defined as consistency of
Scores on test
Scores on test
- It is not possible to eliminate inconsistencies
entirely. What we can do is to try to minimize
the potential sources of inconsistencies.
- Construct validity pertains to the
meaningfulness & appropriateness of the
interpretations that we make on the basis of
-The term construct validity is used to refer to
the extent to which we can interpret a given
test score as an indicator of the ability(ies), or
construct(s), we want to measure with respect
to a specific domain of generalization.
the test task
- We define authenticity as the degree of
correspondence of the characteristics of a
given lg. test task to the features of a TLU task.
Authenticity is important, because:
1- It provides a link between test performance & the
TLU tasks & domain to which we want to generalize.
2- The way test takers perceive the relative
authenticity of test tasks can facilitate their test
-We define interactiveness as the extent & the
type of involvement of the test taker’s
individual characteristics in accomplishing a
- Unlike authenticity, interactiveness resides in
the interaction between the individual ( test
taker or lg. user) & the task (test or TLU).
(Lg. knowledge, Metacognitive strategies)
Characteristics of lg. test task
The typists who perform certain typing
tasks in English very well but they might
be able simply to copy the letters &
words , without processing the document
as a piece of discourse. Therefore:
Authenticity : High
Interactiveness : Low
The typists who are capable of carrying on
“ small talk” about food, clothing, etc.
Authenticity : Low (Lack of relevance of
the test task to the TLU task.)
Interactiveness : High (Test takers have
reasonable amount of control in selecting
topics & influencing the structure of the
International students entering an
American university were given a test of
English vocabulary, to match the words in
one column to the meanings in another one.
Authenticity : Low (few domains involve
this kind of task)
Interactiveness : Low (Highly restricted
involvement of lg. knowledge)
To conduct a face-to-face role play; a
salesperson and a customer.
Authenticity : High (Correspondence
between the characteristics of the TLU
domain and the ones of test task.)
Interactiveness : High (High level of
involvement of all the areas of lg. &
test taker’s topical knowledge.)
POINTS TO REMEMBER
1- Both authenticity & interactiveness are relative.
2- Three types of characteristics must be considered:
those of test takers, TLU task & test task.
3- Certain test tasks may be relatively useful, even
though they are low in authenticity or interactiveness.
4- In designing or analyzing tests, our estimates of
authenticity & interactiveness are only guesses.
5- The minimum acceptable levels that we specify for
authenticity & interactiveness will depend on the
specific testing situation.
- Another quality of tests is their impact on
society & educational systems. The impact of
test use operates at two levels:
In terms of
W A S H B A C K
“ the effect of testing on teaching &
learning.” (Hughes, 1989)
“ how assessment instruments affect
educational practices & beliefs. .”
Impact on individuals
Impact on society & educational system
A) tests takers
A) IMPACT ON TEST TAKERS
Test takers can be affected by three aspects of testing
the experience of taking &, in some cases, of
preparing for the test. (Test taker’s
perception of TLU domain, his areas
of lg. knowledge & his use of
the feedback they receive, about their performance on
B) IMPACT ON TEACHERS
If teachers find that they have to use a specified test, they may
find “ teaching to test” almost unavoidable.
This term implies doing something in teaching that may not
be compatible with teachers’ own values & goals, or with
the values & goals of the instructional program.
One way to minimize the potential for negative impact on
instruction is to change the way we test.
While the other five qualities pertain to the
uses that are made of test scores, practicality
pertains primarily to the ways in which the
test will be implemented, &, to a large
degree, whether it will be developed & used at
all. Thus, a practical test is one whose design,
development & use do not require more resources
that are available.
Thus, determining the practicality of a given test involves
the consideration of:
the resources that will be required to develop an
operational test that has the balance of qualities we want;
the allocation & management of the resources that
are available. Practicality = --------------------------------------
If practicality 1 , the test development & use is
Types of Resources
1- Human resources (e.g test writers, scorers or raters, test
administrators & technical support.)
a) Space (e.g rooms for test development)
2- Material resources b) Equipment (eg. typewriters,
c) Materials (e.g. paper, picture)
a) Time for specific tasks (designing, writing,
3- Time b) Development time