1. UNIVERSIDAD CATOLICA DE LA SSMA. CONCEPCION
THEORETICAL FRAMEWORK
ASSESSMENT
EDUCATION FACULTY
ENGLISH LANGUAGE DEPARTMENT
LUIS FUENTES CID
GABRIEL JARA MUĂ‘OZ
PROFESSOR
ROXANNA CORREA PEREZ
DATE
CONCEPCION, 2013
2. Introduction
This essay attempts to explain and compare the concept of assessment in the
context of Second Language Acquisition (SLA) from different authors‟ perspectives
(Coombe, Brown, Bachman), focusing on the characteristics that a formal test should
have in order to evaluate language. First of all there is a distinction that must be made
between the concepts of assessment and test. Assessment is described as a process to
collect information regarding learners language ability or achievement (Coombe 2007
pg 14), whereas test are described as part of assessing, but in a more practical,
systematical process involving the measurement of students‟ achievement or
progression throughout language learning process in stages (units, lessons) (Coombe
2007 pg 14). Therefore when ideas or concepts about assessing students language are
made on this essay, they will eventually refer to test and how test should be created in
order to evaluate language learning. Among the perspectives made by the authors
regarding test features, they all sort of agree in important aspects to take into account
when assessing students. They have all set a list of “principles” (Coombe and Brown) or
“qualities” (Bachman 1996), that in a way establishes standards of an useful, well
designed test. Coombe (2007) among a quite long description of test in many aspects
proposes a nine guiding principles of good test: usefulness, validity, reliability,
practicality, washback, authenticity, transparency and security. Brown (2004) being
more precise and practical in its explanation proposes five principles for test
effectiveness: practicality, reliability, validity, authenticity and washback. On a similar
way but with important concept difference, Bachman (1996) proposes three principles
3. related to usefulness of a test, as the most important aspect to consider, and six
qualities of a test (features or characteristics) that make a test useful: Reliability,
construct validity, authenticity, interactiveness, impact and practicality. Having provided
the most important aspects for the authors, they will be explained in detail and
compared among different perspectives to highlight important differences as well as
relevant concepts conveyed among the three authors.
In „Conceptual bases of test development‟ Bachman (1996) states that when it
comes to produce a test, it is very important that testers consider the intentions of the
instrument, concluding that the most important quality of a test is its usefulness
(Bachman, 1996 p17). This usefulness is defined as the appropriate balance among the
qualities (Bachman 1996 p18) (which Brown and coombe would call principles) of a test
which for the author are reliability, construct validity, authenticity, interactiveness, impact
and practicality.
The author describes three principles as a basis for the well function of their
model of usefulness in language tests. Saying principle number 1 that it is the overall
usefulness of the test that is to be maximized, rather than the individual qualities that
affect usefulness. Principle number 2 states that the individual test qualities cannot be
evaluated independently, but must be evaluated in terms of their combined effect on the
overall usefulness of the test. Finally, principle number 3 says that test usefulness and
the appropriate balance among the different qualities cannot be prescribed in general,
but must be determined for each specific testing situation. (Bachman 1996 p18)
4. Bachmann (1996) describes reliability and validity as critical and essential for a
test because these two qualities provide major justification for using scores (numbers)
as a reference for making decisions. He states that reliability is more related to test
score. For instance if the same test is taken by the same group on two different
occasions and setting, it should not make any difference if an individual takes the test in
one setting or the other. If the result is different, the test that was administered was not
reliable. On the other hand validity or, as defined by the author, construct validity is the
degree to what we can relate of a given test score as a signification on the ability that
wants to be measured on the test (Bachman 1996). In this case construct stands for an
precise definition for the ability that is going to be the base for the production of the test.
Bachman (1996) define authenticity as the degree of relation between the TLU (target
language use) and the tasks that are presented in the test. Interactiveness is also a
defined as a „relation‟ or interaction which in this case is between the test taker and the
task which can be from the test or a TLU task (Bachman, 1996). Another quality that the
author mentions is the impact and he defines it as the impact not only for the for the
individuals who are preparing themselves for the test and taking it or the teachers, but
also in a macro level in terms of the educational system or society from it context.
Finally, Bachman declares that Practicality refers to the ways that a test is going to be
implemented, if the resources required for implementing the test exceed the resources
available, the test will be impractical (Bachman 1996 p35).
5. Brown (2004) states five principles in language assessment to should be applied
to test in order to achieve effectiveness, appropriate administrative constraints, fair
measurement of contents and dependability. The author starts with practicality, which
being very briefly describe, encompasses four main aspects to make an evaluation tool
“friendly” in terms of applying and checking it. A test should be cheap, non-time
consuming (to create it), easy to check and easy to administer. Clearly practicality deals
with common sense aspects when it comes to assessing, for is not an easy task to carry
out when teachers have more than thirty students per classroom; therefore Brown
(2004) presents this aspects to make work easier, but not less effective. Secondly,
Brown (2004) presents reliability, as a principle that encompasses the constancy
dependability of a test, meaning that tests result should presents similar numbers or
scores if applied in different grades/classes, which if successful will provide to teachers
a trustworthy tool to assess language. Interestingly, Brown (2004) describes as well as
different factors that, besides test-creation, affects its reliability. Taking for instance
human factors like students-reliability, which explains that students originated-factors
such as emotions or attitudes will alter the reliability of results; therefore the accuracy of
them, rater reliability which describes the possible tendencies that of a rater towards
certain aspects of the test or even student, and non-human factors such as test
administration reliability, meaning all the logistic factors that could affect the optimal
implementation of a test and finally the test reliability, which includes any error or
problem that the evaluation tool might have.
Thirdly Brown (2004) presents a rather complex term alluding to the coherence between
what and how something has been taught, and how it has been evaluated, which
6. isbasically the relationship between methodology and test results: Validity. However
Brown (2004) also explains how could validity be measured or proved in a test, so he
proposes different kinds of evidence that could support the fact that a test is valid:
content related evidence, criterion related evidence, construct related evidence,
consequential validity and face validity. Authenticity is also mention by the author as “a
task is likely to be enacted in the real world”, enlisting several aspects that a test should
present in order to be authentic, such as usage of natural language, contextualized
items, meaningful topics, contents presented in a thematic structured way like a
storyline, tasks reflect real world activities.
Washback is a concept also presented by Brown (2004) and is described as the effects
that the test could have in students preparation for it, or the process of students being
prepared for it, and its implications on finding out weakness and strengths as well as
feedback for the task.
Coombe (2007) in an extended but also very detailed text presents nine
principles to make of an assessment a well designed and developed tool. The authors
starts with usefulness of a test, which in agreement with Bachman and Palmer (1996)
states that is a quite important aspects for it defines the intention or purpose for what a
test was created or design, as well as establishing the target content, skills etc.
Validity is defined for Coombe (2007) with very practical words, as “test what you teach
and how you teach it! (pg 22)”, meaning that the way a teacher evaluates students
should be consistent with the way they were taught in terms of format, approach, skills,
target language etc. Interestingly Coombe (2007) also present a sort of subdivision in
the principle of validity just as Brown (2004) did, but in concise way focusing mainly in
7. construct validity, understood as the link between methodology and theoretical
background (why is this test design the way it is) and face validity, understood as the
way the test looks right in terms of supposedly measuring what is supposed to measure
in a familiar way as the content was taught to the students.
Reliability is also defined by Coombe (2007) as the consistency of results in a test,
which means that students should get similar scores independently if applied at different
times, or with different versions of the same test. The authors identifies within reliability
potential sources of error that could affect the scores of a test which are called
fluctuations in different aspects of a test such as fluctuations in the learner, the scoring
and the test administration. All of them describing possible inconveniences when it
comes to assessing.
Practicality is shortly defined as a way of making assessing an easy task for teachers
considering time to check, develop and administer a test as well as money and
resources issues. Not because is short or easier to make is going to be a poor quality
test.
Washback refers to the effect of testing on the students as well as teachers (Coombe,
2007) for the reason that is gives learners (if test is well designed) a sense of
accomplishment because test would work as an indicator that they are getting closer to
the general objectives of a course and eventually they will develop all their abilities and
skills if the content is taught and assessed properly.
Authenticity is highly related to motivation through inserting in test real world tasks, for it
works as a mirror for closely related students situation where they can actually use the
language that was taught.
8. Transparency deals with giving all information to learners to make it fair, meaning that
they will know how they are being evaluated, what are they asked to do and allowed to
have a fair scoring.
Security, closely relate to reliability and validity, for if any information related to test
escapes from the domain of the teacher will eventually affect its reliability and validity
when it comes to evaluate others students.
After providing all of the definitions, is necessary to compare them and see how
they differ and resemble. To make it tidy, principles will be compared one by one, even
though there is not much difference in the content of the principle but how they have
been conceived by authors and described including or omitting certain aspects.
Practicality has been seen by the three authors in a pretty similar way, since they all
included what could be called as common sense features when assessing a class such
as resources (human and material) and time.
Reliability has also been described as the consistency (Brown, 2004, Coombe 2007,
Bachman 1996) and dependability (Brown 2004) of test scoring, no matter the
circumstance in which the test would have been applied. There is a small difference
though between Brown (2004) and Coombe (2007) because even though both of them
present possible or potential errors that could affect reliability of scoring, are described
in different ways and names, but still with the same aiming.
The principle of validity instead has some variations when presented by the authors, for
its presented with different degrees of depth or subdivisions of validities of a test.
Bachman (1996) presents only the concept of construct validity alluding to the idea of
relationship between test scores and its signification on the measured ability, pretty
9. similar to what Coombe (2007) did plus the concept of face validity; however Brown
(2004) develops several degrees of validity as a way to give evidence to support validity
in tests.
Authenticity also mentioned by the three authors, but similarly described by Brown
(2004) and Coombe (2007) as real world tasks in test, however Bachman (1996)
presents authenticity related to the concept of TLU which in other words could be how
close are the test activities related to the target language in use, real communication
context.
WashBack is a principle presented by Brown (2004) and Coombe (2007) as the effects
of students preparing for a test, but Bachman (1996) present the concept within the
quality of Impact.
Other concepts that were not present on all the texts were transparency and security by
Coombe (2007) which could be anyways related to some major principles for they are
very specific in their description, since they refer to the alteration of the scoring.
Bachman (1996) also present some new concepts such as interactiveness which mainly
describes the relation or interaction between the test taker and the characteristics of the
task given in a test or in real life communication.
It can be clearly seen that all of these principles are very important when assessing
language, since the requirements of these principles provide teachers tools to make
tests effective and efficient for students and themselves, measuring what teachers are
supposed to measure, coherent to the contents teacher presented during classes and
10. also the abilities of the language required for an appropriate performance among other
characteristics of a well-designed test.
11. References
Bachman, L. F. & Palmer A. S. (1996). Language Testing in Practice: designing and
developing useful language tests. Oxford: Oxford University Press
Brown, H. D. (2004). Language Assessment-Principles and Classroom
Practices.Longman. New York.
Coombe, Ch., Folse, K. &Hubley, N. (2007) A practical guide to assessing English
language learners. University of Michigan Press.