Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.
Principles of Language Assessment:
Test Usefulness
Course: Testing
Bachman &
Palmer, Ch. 2
The most important quality of a test is its usefu
But,
-What makes a test useful ?
- How do we know a test will be useful ...
Simply using a test does not
make it useful !
A model of test usefulness has been
proposed that include six test
qualities.
Model of
test
usefulness
Reliability
Construct
validity
Authenticity
Interactiveness
Impact
Practicality
Usefulness =
Reliability + Construct validity +
Authenticity + Interactiveness + Impact
+ Validity
This model along with the three
principles, provides a basis for
answering this question:
“ How useful is this particular
...
It is the overall usefulness of the test
that is to be maximized, rather than
the individual qualities that affect
usefuln...
The individual test qualities cannot be
evaluated independently, but must be
evaluated in terms of their combined
effect o...
Test usefulness & the appropriate
balance among the different qualities
cannot be prescribed in general, but
must be deter...
Therefore,
In order to be useful, any given
lg. test must be developed
with a specific purpose, a particular group
of test...
1
R
E
L
I
A
B
I
L
I
T
Y
- Reliability is often defined as consistency of
measurement.
Scores on test
tasks with
characteri...
2
C
o
n
s
t
r
u
c
t
- Construct validity pertains to the
meaningfulness & appropriateness of the
interpretations that we m...
Score interpretation:
Interference
about lg.
ability
(construct
definition)
Domain
of
generalization
TEST SCORE
Lg. abilit...
3
A
U
T
H
E
N
T
I
C
I
T
Y
Characteristics
of the
TLU task
Characteristics
of the
Test task
Authenticity
- We define authen...
4
I
N
T
E
R
A
C
T
I
Ven
ess
-We define interactiveness as the extent & the
type of involvement of the test taker’s
individ...
Interactiveness
LANGUAGE ABILITY
(Lg. knowledge, Metacognitive strategies)
Characteristics of lg. test task
Topical
Knowle...
Example 1
The typists who perform certain typing
tasks in English very well but they might
be able simply to copy the lett...
Example 2
The typists who are capable of carrying on
“ small talk” about food, clothing, etc.
Authenticity : Low (Lack of ...
Example 3
International students entering an
American university were given a test of
English vocabulary, to match the wor...
Example 4
To conduct a face-to-face role play; a
salesperson and a customer.
Authenticity : High (Correspondence
between t...
POINTS TO REMEMBER
1- Both authenticity & interactiveness are relative.
2- Three types of characteristics must be consider...
5
I
M
P
A
C
T
- Another quality of tests is their impact on
society & educational systems. The impact of
test use operates...
W A S H B A C K
“ the effect of testing on teaching &
learning.” (Hughes, 1989)
“ how assessment instruments affect
educat...
Washback
Impact on individuals
Impact on society & educational system
A) tests takers
B) teachers
A) IMPACT ON TEST TAKERS
Test takers can be affected by three aspects of testing
procedure:
 the experience of taking &, ...
B) IMPACT ON TEACHERS
If teachers find that they have to use a specified test, they may
find “ teaching to test” almost un...
6
P
R
A
C
T
I
C
A
L
I
T
Y
While the other five qualities pertain to the
uses that are made of test scores, practicality
pe...
Thus, determining the practicality of a given test involves
the consideration of:
 the resources that will be required to...
Types of Resources
1- Human resources (e.g test writers, scorers or raters, test
administrators & technical support.)
a) S...
Test Usefulness
Upcoming SlideShare
Loading in …5
×

Test Usefulness

10,230 views

Published on

Testing, TEFL

Published in: Education

Test Usefulness

  1. 1. Principles of Language Assessment: Test Usefulness Course: Testing Bachman & Palmer, Ch. 2
  2. 2. The most important quality of a test is its usefu But, -What makes a test useful ? - How do we know a test will be useful before we - Or it has been useful after we have used it ?
  3. 3. Simply using a test does not make it useful ! A model of test usefulness has been proposed that include six test qualities.
  4. 4. Model of test usefulness Reliability Construct validity Authenticity Interactiveness Impact Practicality
  5. 5. Usefulness = Reliability + Construct validity + Authenticity + Interactiveness + Impact + Validity
  6. 6. This model along with the three principles, provides a basis for answering this question: “ How useful is this particular test for its intended purpose(s) ? “
  7. 7. It is the overall usefulness of the test that is to be maximized, rather than the individual qualities that affect usefulness.
  8. 8. The individual test qualities cannot be evaluated independently, but must be evaluated in terms of their combined effect on the overall usefulness of the test.
  9. 9. Test usefulness & the appropriate balance among the different qualities cannot be prescribed in general, but must be determined for each specific testing situation.
  10. 10. Therefore, In order to be useful, any given lg. test must be developed with a specific purpose, a particular group of test takers and a specific lg. use domain. “ target lg. use” or TLU *( tasks in the TLU domain “TLU tasks”
  11. 11. 1 R E L I A B I L I T Y - Reliability is often defined as consistency of measurement. Scores on test tasks with characteristics A Scores on test tasks with characteristics A’ Reliability - It is not possible to eliminate inconsistencies entirely. What we can do is to try to minimize the potential sources of inconsistencies.
  12. 12. 2 C o n s t r u c t - Construct validity pertains to the meaningfulness & appropriateness of the interpretations that we make on the basis of test scores. -The term construct validity is used to refer to the extent to which we can interpret a given test score as an indicator of the ability(ies), or construct(s), we want to measure with respect to a specific domain of generalization. V a l i d i t y
  13. 13. Score interpretation: Interference about lg. ability (construct definition) Domain of generalization TEST SCORE Lg. ability Characteristics of the test task Interactiveness Constructvalidity Authenticity
  14. 14. 3 A U T H E N T I C I T Y Characteristics of the TLU task Characteristics of the Test task Authenticity - We define authenticity as the degree of correspondence of the characteristics of a given lg. test task to the features of a TLU task. Authenticity is important, because: 1- It provides a link between test performance & the TLU tasks & domain to which we want to generalize. 2- The way test takers perceive the relative authenticity of test tasks can facilitate their test performance.
  15. 15. 4 I N T E R A C T I Ven ess -We define interactiveness as the extent & the type of involvement of the test taker’s individual characteristics in accomplishing a test task. - Unlike authenticity, interactiveness resides in the interaction between the individual ( test taker or lg. user) & the task (test or TLU).
  16. 16. Interactiveness LANGUAGE ABILITY (Lg. knowledge, Metacognitive strategies) Characteristics of lg. test task Topical Knowledge Affective Schemata
  17. 17. Example 1 The typists who perform certain typing tasks in English very well but they might be able simply to copy the letters & words , without processing the document as a piece of discourse. Therefore: Authenticity : High Interactiveness : Low
  18. 18. Example 2 The typists who are capable of carrying on “ small talk” about food, clothing, etc. Authenticity : Low (Lack of relevance of the test task to the TLU task.) Interactiveness : High (Test takers have reasonable amount of control in selecting topics & influencing the structure of the interaction.)
  19. 19. Example 3 International students entering an American university were given a test of English vocabulary, to match the words in one column to the meanings in another one. Authenticity : Low (few domains involve this kind of task) Interactiveness : Low (Highly restricted involvement of lg. knowledge)
  20. 20. Example 4 To conduct a face-to-face role play; a salesperson and a customer. Authenticity : High (Correspondence between the characteristics of the TLU domain and the ones of test task.) Interactiveness : High (High level of involvement of all the areas of lg. & test taker’s topical knowledge.)
  21. 21. POINTS TO REMEMBER 1- Both authenticity & interactiveness are relative. 2- Three types of characteristics must be considered: those of test takers, TLU task & test task. 3- Certain test tasks may be relatively useful, even though they are low in authenticity or interactiveness. 4- In designing or analyzing tests, our estimates of authenticity & interactiveness are only guesses. 5- The minimum acceptable levels that we specify for authenticity & interactiveness will depend on the specific testing situation.
  22. 22. 5 I M P A C T - Another quality of tests is their impact on society & educational systems. The impact of test use operates at two levels: a micro level a macro level Individuals who are affected by the particular tests use. In terms of educational system or society.
  23. 23. W A S H B A C K “ the effect of testing on teaching & learning.” (Hughes, 1989) “ how assessment instruments affect educational practices & beliefs. .” (Cohen, 1994)
  24. 24. Washback Impact on individuals Impact on society & educational system A) tests takers B) teachers
  25. 25. A) IMPACT ON TEST TAKERS Test takers can be affected by three aspects of testing procedure:  the experience of taking &, in some cases, of preparing for the test. (Test taker’s perception of TLU domain, his areas of lg. knowledge & his use of strategies)  the feedback they receive, about their performance on the test,
  26. 26. B) IMPACT ON TEACHERS If teachers find that they have to use a specified test, they may find “ teaching to test” almost unavoidable. This term implies doing something in teaching that may not be compatible with teachers’ own values & goals, or with the values & goals of the instructional program. One way to minimize the potential for negative impact on instruction is to change the way we test.
  27. 27. 6 P R A C T I C A L I T Y While the other five qualities pertain to the uses that are made of test scores, practicality pertains primarily to the ways in which the test will be implemented, &, to a large degree, whether it will be developed & used at all. Thus, a practical test is one whose design, development & use do not require more resources that are available.
  28. 28. Thus, determining the practicality of a given test involves the consideration of:  the resources that will be required to develop an operational test that has the balance of qualities we want; &  the allocation & management of the resources that are available. Practicality = -------------------------------------- Available resources Required resources If practicality 1 , the test development & use is practical.
  29. 29. Types of Resources 1- Human resources (e.g test writers, scorers or raters, test administrators & technical support.) a) Space (e.g rooms for test development) 2- Material resources b) Equipment (eg. typewriters, computers) c) Materials (e.g. paper, picture) a) Time for specific tasks (designing, writing, analyzing) 3- Time b) Development time

×