Some Reflections on Task-based Language Performance Assessment

LOGO
Some reflections
on task-based language performance
assessment
L. F. Bachman (2002)
by
Parisa Mehran

Introduction
What we want to assess1
How we go about it2
What sorts of arguments and evidence we need to
provide to justify the inferences and decisions we
make on the basis of our assessments
3
4
The complexities of task-based language performance
assessment (TBLPA ) are leading us to reconsider many of the
fundamental issues about:

The term TBLPA:
‘Performance’ goes back to ‘direct testing’ movement of the
1970s
‘Task-based’:
 Relatively more recent lineage
 Derives from research in SLA and language pedagogy

Tasks and constructs in language assessment
 Distinction between task-centered and ability or construct-centered
approaches to language assessment can be found in both
1. Educational measurement
2. Language testing
 An underlying premise to the discussions of task-based language
assessment is :
inferences we want to make are about underlying ‘language ability’ or
‘capacity for language use’ or ‘ability for use’

A different approach to defining TBLPA according to Norris et al. and Brown et al.
 TBLPA as one kind of performance assessment
 Task-based assessment does not simply utilize real-world task as a means for eliciting particular
components of the language system which are then measured or evaluated; on the contrary, the
construct of interest in task-based assessment is performance on the task itself
 Inferences to be made are about ‘students’ abilities to accomplish particular tasks or task types’
 The difference in this approach lies not in the kinds of assessment tasks that are used (e.g.
employing ‘authentic’ assessment tasks), but rather in the kinds of inferences claimed to be made
on the basis of test-taker’s performance on assessment tasks
 The construct is defined in terms of ‘pragmatic ascription’ or what test-takers can do, and in so
doing, limiting the interpretation to predictions about future performance on real-world tasks
 Brown et al. make distinction between TLBPA and other types of performance assessment by
considering the way one interprets consistencies in responses across a set of assessment tasks

Response consistencies are
interpreted as evidence of
underlying processes or
structures
Consistencies are attributed to
characteristics of the test-taker
Response consistencies are
interpreted as ‘samples of
response classes’
Consistencies are attributed
to contextual factors
‘Behaviorist perspective’ on
construct definition by
Brown et al. and Chapelle (1998):
‘Trait perspective’ by other
proponents of performance
assessment:
Two approaches to interpreting consistencies in responses across a set of assessment tasks:

Difference between ‘ability-based’ and ‘task-based’ approaches
• First, focusing on the construct of interest
• Then, developing tasks based on the
performance attributes of the construct, score
uses, scoring constraints, …
• Both constructs and tasks are considered
‘Ability-based’
approach
• First, deciding which performances are the
desired ones
• Then, score uses, scoring criteria, … become
part of the performance test itself
• Only performances on tasks are considered
‘Task-based’
approach

‘What is this thing called task?’: Content domain specification
 Definitions of task vary from including virtually anything that is done to
distinguishing between ‘real-world tasks’ and ‘pedagogic tasks’ to Skehan’s
(1998) extended definition
Norris et al. (1998)
Define task ‘as those activities
that people do in everyday life and
which require language for their
accomplishment’
A task is essentially a real-world
activity
They do not distinguish between
these and assessment tasks
Bachman and Palmer (1996)
Define a ‘language use task’ as ‘ an
activity that involves individuals in using
language for the purpose of achieving a
particular goal or objective in a
particular situation’
This definition focuses on tasks that
involve language and adds to this the
notions that:
1. Tasks are goal oriented
2. Tasks are situated in specific
settings

Two critical issues that must be addressed in design, development and
use of any language assessment
‘Which tasks do we use?’: Identifying and
selecting assessment tasks
Specification of assessment tasks is a
critical issue because
1. The particular tasks included in the
assessment will provide the basis for
one part of a validity argument: content
relevance and representativeness
2. The degree of correspondence between
the test tasks and tasks outside the test
itself provides a basis for investigating
the authenticity of the test tasks
Content relevance and content
Representativeness
Content relevance: the extent to which
the areas of ability to be assessed are in
fact assessed by the task
Content representativeness : the extent
to which the test adequately samples the
content domain of interest and provides
a basis for investigating task
generalizability and extrapolation

Problem of investigating and demonstrating content relevance and
representativeness is two-fold:
1. We must identify the TLU domain defined as ‘a set of specific language use
tasks that test-taker is likely to encounter outside the test itself, and to
which we want our inferences about language ability to generalize’
2. We then need to select tasks from that domain which will form the basis
for language assessment tasks. Even when a well-defined TLU domain can
be identified, selecting specific tasks from within that domain may be
problematic.

1
Not all TLU tasks
will engage the
areas of ability we
want to assess
2
Some TLU tasks may
not be practical to
administer in an
assessment in their
entirety
3
Some TLU tasks may
not be appropriate or
fair for all test-takers
if they presuppose
prior knowledge or
experience that some
test-takers may not
possess
Bachman and Palmer (1996) suggest three reasons why real-life tasks may not always
be appropriate as a basis for developing assessment tasks:

There are serious problems with the claim that TBLPA’s
distinctive characteristic is that it enables us to make
predictions about future performance. These problems are
related to:
1. Task selection
2. Generalizability
3. Extrapolation
We cannot demonstrate that performance on one assessment
task generalizes to other assessment tasks, or that it
extrapolates to performance on tasks in the TLU domain

‘How hard is it?’: The difficulty with difficulty
The notion that test items or tasks themselves differ in difficulty is
ingrained in both:
1. The way we conceptualize the difficulty of assessment tasks
2. How we operationalize difficulty in most recent measurement
models
Difficulty does not reside in the task alone, but is relative to any
given test-taker

Two general approaches to understand, explain or predict how
difficult a given task will be:
1. First approach: To identify a number of task characteristics that are
considered to be essentially independent of ability and then
investigate the relationships between these characteristics and
empirical indicators of difficulty
2. Second approach: To explicitly identify ‘difficulty features’, which
are essentially combinations of ability requirements and task
characteristics that are hypothesized to affect the difficulty of a
given task

The relationships among task characteristics and task difficulty have been
researched for over a decade; however, the results have brought us no closer
to an understanding of this relationship. Possible explanations by the
researchers:
1. Methodological limitations in the studies
2. Differences between testing and pedagogic contexts, the former
producing a cognitive focus on display rather than on task fulfillment
or getting the message across
3. Assessment tasks may be fundamentally different from pedagogic or
‘real-life’ tasks
This explanation raises questions about
 The validity of assessing certain aspects of language ability
with certain types of task
 The generalizability of research with SLA and pedagogic
tasks to assessment tasks

Problems with ‘difficulty factors’
Sources of variation or factors that may affect test performance
Bachman (1990)
Factors:
1. Language ability of test-taker
2. Test-task characteristics
3. Personal characteristics of test-
taker
4. Random/unpredictable factors
These factors may well be
correlated with each other except for
random factors
There is no factor identified as
‘difficulty’
Skehan (1196) and Hawaii group
Three task difficulty features
affecting performance on tasks:
1. Code complexity: language
required to accomplish the task
2. Cognitive complexity: thinking
required to accomplish the task
3. Communicative stress:
performance conditions for
accomplishing a task

Code
complexity
Cognitive
complexity
Communicative
stress
Other
factors
(?)
Task
difficulty
Test
performance
Two problems with this formulation:
1. The difficulty features confound
the effects of the test-taker’s ability
with the effects of the test tasks
2. This approach introduces a
hypothetical entity, ‘task difficulty’,
as a primary determinant of test
performance and as a separate
factor

 Problems with the way difficulty is operationalized in current measurement models
 Some indicators of difficulty are averages of performance across facets of
measurement and do not consider differential performance of different individuals
 In measurement models, ‘difficulty’ is operationalized
 As either an average of scores on a given task or facet of measurement across a
group of test-takers
 Or as an interaction between the latent trait and performance on a given task
‘Difficulty’ is essentially an artifact of test performance and not a characteristic of
assessment tasks themselves
 Empirical estimates of task difficulty are not estimates of a separate entity,
‘difficulty’, but are themselves artifacts of the interaction between the test-taker’s
ability and the characteristics of the task
 Problem with trying to predict empirical difficulty from task characteristics
 The approach of using task characteristics to predict empirical estimates of item
difficulty is problematic because these item statistics are themselves a function of
interactions between test-takers and test tasks

Conclusions
A fundamental aim of most language performance assessment is to present
test-takers with tasks
That correspond to tasks in ‘real-world’ settings
That will engage test-takers in language use or the creation of discourse
A solely task-based approach is problematic. Most useful assessments in all
situations will be those that are based on the planned integration of both
tasks and constructs in the way they are designed, developed and used.
Task specification will also present challenges to such an integrated approach;
however, an integrated approach
Makes it possible for test-users to make a variety inferences about the
capacity for language use that test-takers have, or about what they can and
cannot do
Makes available to test-developers the full range of validation arguments
that can be developed in support of a given inference or use

Some Reflections on Task-based Language Performance Assessment

More Related Content

What's hot

Viewers also liked

Similar to Some Reflections on Task-based Language Performance Assessment

More from Parisa Mehran

Recently uploaded

Some Reflections on Task-based Language Performance Assessment