This document provides an overview of key concepts in language assessment, including definitions of tests, different types of language tests (performance, knowledge), assessments versus evaluations, norm-referenced versus criterion-referenced tests, and approaches to language testing from the 1970s to current approaches. It discusses informal and formal assessment, formative and summative assessment, and principles of language assessment such as practicality, reliability, and validity. Current issues discussed include new views of intelligence, traditional versus alternative assessments including computer-based testing.
1. University of Luzon, Graduate School. Literacy Assessment
Professor: Leah Solmerin Corpuz, Ph.D.
1
BASIC CONCEPTS
(Brown, 2004)
TEST It is a method of measuring a person’s ability, knowledge, or performance in a given
domain.
As a method,
It is an instrument—a set of techniques, procedures, or items—that requires
performance on the part of the test taker.
It must be explicit and structured.
It must measure,
either general ability, e.g. a multi-skill proficiency test
or specific competencies/objectives, e.g. quiz on recognizing correct use of
definite articles.
It must measure a given domain. For example, a test of pronunciation might be only
on a limited set of phonemic minimal pairs.
It must be able to communicate results (feedback)
letter-grade
total numerical score, a percentile rank, sub-scores.
LANGUAGE
TEST
It measures one’s ability to perform language: to speak, write, read, listen.
PERFORMANCE
LANGUAGE
TEST
It tests the test-taker’s actual use of language; then from those samples, the test
administrator infers general competence.
LANGUAGE
KNOWLEDGE
TEST
It tests the test-taker’s knowledge about language: defining a vocabulary item,
reciting a grammatical rule, identifying a rhetorical feature in written discourse.
TESTS versus
ASSESSMENT
Testing and assessing are not synonymous.
Tests are administrative procedures that occur at identifiable times in a curriculum
when learners muster all their faculties to offer peak performance, knowing that their
responses are being measured and evaluated.
Assessment is an ongoing process that encompasses a much wider domain.
Whenever a student responds to a question, offers a comment, or tries out a
new word or structure, the teacher subconsciously makes an assessment of
the student’s performance.
A good teacher never ceases to assess students, whether assessments are
intended or incidental.
Tests, are a subset of assessment. AND tests are not the only form of assessment
that a teacher can make.
2. University of Luzon, Graduate School. Literacy Assessment
Professor: Leah Solmerin Corpuz, Ph.D.
2
Figure 1. Tests, assessment, and teaching
INFORMAL
ASSESSMENT
It can be incidental, unplanned comments and responses, along with coaching
and other impromptu feedback to the student.
It can be embedded in classroom tasks designed to elicit performance without
recording results and making fixed judgments about a student’s competence.
FORMAL
ASSESSMENT
It is systematic, planned sampling techniques constructed to give teacher and
student an appraisal of student achievement.
FORMAL
ASSESSMENT
versus TEST
All tests are formal assessment; but not all formal assessment is testing.
Student’s journal or portfolio of materials can be formal assessment, but not
test.
A systematic observation of a student’s frequency of oral participation in class is
a formal assessment, but not a test.
Tests are usually relatively time-constrained and draw on a limited sample of
behavior.
Informal Formal
Assessment Assessment
Functions of Assessment
FORMATIVE
ASSESSMENT
This is used in evaluating students in the process of “forming” their competencies
and skills with the goal of helping them to continue that growth process, through
1) Delivery (by the teacher)
TESTS
ASSESSMENT
TEACHING
Examples:
Marginal comments of papers
Responding to a draft of an essay
Advice about how to better pronounce a word
A suggestion for a strategy for compensating for
a reading difficulty
Showing how to modify a student’s note-taking
to better remember the content of a lecture
3. University of Luzon, Graduate School. Literacy Assessment
Professor: Leah Solmerin Corpuz, Ph.D.
3
2) Internalization (by the student)
Of appropriate feedback on performance, with an eye toward the future
continuation (or formation) of learning.
Primary focus: the ongoing development of the learner’s language.
Examples: Comment, suggestion, or calling the attention of a student to an error,
SUMMATIVE
ASSESSMENT
This is used to measure or summarize, what a student grasped, and typically occurs
at the end of a course or unit of instruction
Example: Final (or periodic) exams, general proficiency exams.
ASSESSMENT versus EVALUATION (Bachman and Palmer, 2010)
ASSESSMENT It is the process of collecting information about something that we are interested
in, according to procedures that are systematic and substantively grounded.
Systematic:
Designed and carried out according to clearly defined procedures that are
methodical and open to scrutiny by other test developers and researchers.
An assessment conducted by one person at one time could potentially be
replicated by another person at another time.
Substantively grounded:
Based on a recognized and verifiable area of content, such as a course
syllabus, a widely accepted theory about the nature of language ability, prior
research, including a needs analysis, or the currently accepted practice in the
field.
EVALUATION Bachman (1990, 2004b) describes evaluation that is different from assessment.
Evaluation involves making value judgments and decisions on the basis of
information, and gathering information to inform such decisions is the primary
purpose for which language assessments are used.
NORM-
REFERENCED
TESTS
Each test-taker’s score is interpreted in relation to a mean (average score),
median (middle score), standard deviation (extent of variance in scores), and/or
percentile rank.
PURPOSE is to place test-takers along a mathematical continuum in rank order.
Scores are usually reported back to the test-taker in the form of a numerical
score and a percentile rank.
Example: TOEFL.
Since this kind of test is intended to large audiences, such tests must have
fixed, predetermined responses in a format that can be scored quickly at
minimum expense (money and efficiency).
CRITERION-
REFERENCED
TESTS
These are designed to give test-takers feedback, usually in the form of grades, on
specific course or lesson objectives.
4. University of Luzon, Graduate School. Literacy Assessment
Professor: Leah Solmerin Corpuz, Ph.D.
4
APPROACHES TO LANGUAGE TESTING
DISCRETE-POINT and INTEGRATIVE TESTING (1970s to 1980s)
DISCRETE-
POINT TESTS
These are constructed on the assumption that language can be broken down
into component parts and that those parts can be tested successfully.
The claim is that, an overall language proficiency test should sample all four
skills and as many linguistic discrete points as possible.
Disadvantage: This approach demands decontextualization that often confuses
test-taker.
INTEGRATIVE
TESTING
With the emphasis on communication, authenticity, and context…
Oller (1972) argued that language competence is a unified set of interacting
abilities that cannot be tested separately. His claim was that communicative
competence is so global and requires such integration that it cannot be
captured in additive tests of grammar, reading, vocabulary, and other discrete
points of language.
Example: Cloze tests, dictation
Proponents of integrative test methods soon centered their arguments on what
became known as the “unitary trait hypothesis”, which suggested an
“indivisible” view of language proficiency: that vocabulary, grammar,
phonology, and other discrete points of language could not be disentangled
from each other in language performance.
COMMUNICATIVE LANGUAGE TESTING (mid 1980s)
MAIN ISSUE
ADDRESSED
Integrative tests such as cloze only tell us about a candidate’s linguistic
competence. They do not tell us anything directly about a student’s
performance ability.”
Language performance versus Language Use: Bachman and Palmer (1996,
p. 9) “In order for a particular language test to be useful for its intended
purposes, test performance must correspond in demonstrable ways to
language use in non-test situations.
FEATURES Following Canale and Swain’s (1980) model of communicative competence,
Bachman (1990) proposed a model of language competence consisting of
organizational and pragmatic competence…all elements of the model, especially
pragmatic and strategic abilities, needed to be included in the construct of
language testing and in the actual performance required of test-takers.
PROCESS Identify the kinds of real-world tasks that language learners were called upon
to perform.
Weir (1990), consider: where, when, how, with whom, and why language is to
be used, and on what topics, and with what effect.
5. University of Luzon, Graduate School. Literacy Assessment
Professor: Leah Solmerin Corpuz, Ph.D.
5
LANGUAGE ASSESSMENT (2000s)
MAIN ISSUE
ADDRESSED
Integrative tests such as cloze only tell us about a candidate’s linguistic
competence. They do not tell us anything directly about a student’s
performance ability.”
Language performance versus Language Use: Bachman and Palmer (1996, p. 9)
“In order for a particular language test to be useful for its intended purposes,
test performance must correspond in demonstrable ways to language use in
non-test situations.
FEATURES Following Canale and Swain’s (1980) model of communicative competence,
Bachman (1990) proposed a model of language competence consisting of
organizational and pragmatic competence…all elements of the model, especially
pragmatic and strategic abilities, needed to be included in the construct of
language testing and in the actual performance required of test-takers.
PROCESS Identify the kinds of real-world tasks that language learners were called upon
to perform.
Weir (1990), consider: where, when, how, with whom, and why language is to
be used, and on what topics, and with what effect.
PERFORMANCE-BASED ASSESSMENT
MAIN ISSUE
ADDRESSED
Instead of just offering paper-and-pencil selective response tests of a plethora
of separate items, PBA of language typically involves oral production, written
production, open-ended responses, integrated performance (across skill areas),
group performance, and other interactive tasks.
Students are assessed as they perform actual or simulated real-world tasks—
higher content validity, because learners are measured in the process of
performing the targeted linguistic acts.
DISADVANTAGE Time-consuming and expensive.
MAIN FEATURE Learners are measured in the process of performing the targeted linguistic acts. A
characteristics of PBA (but not all) is the presence of interactive tasks. The test-
takers are measured in the act of speaking, requesting, or in combining listening
and speaking, and in integrating reading and writing.
Example: oral interview.
CURRENT ISSUES IN CLASSROOM TESTING
NEW VIEWS
ON
INTELLIGENCE
The IQ concept of intelligence—standardized, norm-referenced tests that are
timed in a multiple-choice format consisting of a multiplicity of logic-
constrained items, many of which are inauthentic, focused only on linguistic and
logical-mathematical intelligences.
Then came Howard Gardner’s (1983, 1999) et.al. research on multiple
intelligences.
6. University of Luzon, Graduate School. Literacy Assessment
Professor: Leah Solmerin Corpuz, Ph.D.
6
Robert Stenberg (1988, 1997) recognizes creative thinking and manipulative
strategies as part of intelligence.
Daniel Goleman (1995) introduced the concept of EQ (emotional intelligence)
TRADITIONAL AND ALTERNATIVE ASSESSMENT
COMPUTER-
BASED
TESTING
Advantages:
Classroom-based testing
Self-directed testing on various aspects of a language
Practice for upcoming high-stakes standardized tests
Some individualization, in the case of CATs
Large-scale standardized tests that can be administered easily to thousands of
test-takers at many different stations, then scored electronically for rapid
reporting of results.
Disadvantages:
Lack of security and the possibility of cheating are inherent in classroom-based,
unsupervised computerized tests.
Occasional “home-grown” quizzes that appear on unofficial websites may be
mistaken for validated assessments.
The multiple-choice format preferred for most computer-based tests contains
the usual potential for flawed item design.
Open-ended responses are less likely to appear because of the need for human
scorers, with all the attendant issues of cost, reliability, and turn-around time.
The human interactive element (especially in oral production) is absent.
BASIC PRINCIPLES
1. Periodic assessments, both formal and informal, can increase motivation by serving as
milestones of student progress.
2. Appropriate assessments aid in the reinforcement and retention of information.
3. Assessment can confirm areas of strength and pinpoint areas needing further work.
4. Assessments can provide a sense of periodic closure to modules within a curriculum.
5. Assessments can promote student autonomy by encouraging students’ self-evaluation of their
progress.
6. Assessment can spur learners to set goals for themselves.
7. Assessments can aid in evaluating teaching effectiveness.
PRINCIPLES OF LANGUAGE ASSESSMENT
PRACTICALITY Not excessively expensive
Stays within appropriate time constraints
Relatively easy to administer
Has a scoring/evaluation procedure that is specific and time-efficient
RELIABILITY Consistent and dependable
-student-related reliability
-rater reliability
-test administration reliability
-test reliability
VALIDITY The extent to which inferences made from assessment results are appropriate,
meaningful, and useful in terms of the purpose of the assessment.
7. University of Luzon, Graduate School. Literacy Assessment
Professor: Leah Solmerin Corpuz, Ph.D.
7
(Can be through statistical correlation, consequences of a test, test-taker’s
perception of validity.
Content-
related
evidence
If a test actually samples the subject matter about which conclusions are to be
drawn, and if it requires the test-taker to perform the behaviour that is being
measured.
Criterion-
related
evidence
If the results are supported by other concurrent performance beyond the
assessment itself.
Construct-
related
evidence
If a test is supported by any theory, hypothesis, or model.
Consequential
validity
It encompasses all the consequences of a test, including such considerations as its
accuracy in measuring intended criteria, its impact on the preparation of test-
takers, its effect on the learner, and the (intended and unintended) social
consequences of a test’s preparation and use.
Face Validity It refers to the degree to which a test looks right, and appears to measure the
knowledge or abilities it claims to measure, based on the subjective judgment of
the examinees who take it, the administrative personnel who decide on its use, and
other psychometrically sophisticated observers (Mousavi, 2002).
A well-constructed, expected format with familiar tasks
A test that is clearly doable within the allotted time limit
Items that are clear and uncomplicated
Directions that are clear
Tasks that relate to their course work (content validity), and
A difficulty level that presents a reasonable challenge.
AUTHENTICITY The language in the test is as natural as possible.
Items are contextualized rather than isolated.
Topics are meaningful for the learner.
Some thematic organization to items is provided, such as through a story line
or episode.
Tasks represent, or closely approximate, real-world tasks.
WASHBACK This refers to the effects the tests have in terms of how students prepare for the
test.
Washback for:
-intrinsic motivation
-autonomy
-self-confidence
-language ego
-interlanguage
-strategic investment
Sources:
Bachman, L. & Palmer, A. (2010). Language assessment in practice. NY: Oxford University Press.
Brown, H. (2004). Language assessment: Principles and classroom practices. NY: Pearson Education,
Inc.