Second Language
Assessment
Andrew Cohen
Misuses of Tests
• Tests were used as a punishment.
• Tests were administered instead of teachers giving
instructions.
• The tests were the only measure for grading.
• Test did not reflect what was taught.
• The tests were returned with lack of corrections or
explanations.
• The tests reflected only one testing method.
• There was a lack of teacher confidence in their own tests.
• Students were not adequately trained to take the results.
• There was a substantial delay in returning the tests.
A more constructive way of language
testing exists when:
• Testing is seen as an opportunity for interaction between
teacher and student.
• Students are judged based on the knowledge they have.
• The tests are intended for students to improve their skills
• The criteria for success on the test are clear to students
• Students receive a grade for their performance on a set of tests
representing different testing methods.
• The test takers are trained on how to take tests especially those
involving unfamiliar formats.
• The tests are returned promptly.
• The results are discussed.
Prepared by:
Marie Joy M. Anhaw
THEORETICAL
FOUNDATIONS
Primary Functions of Language Assessment
1. Administrative
a. assessment
b. placement
c. exemption
d. certification
e. promotion
2. Instructional
a. diagnosis
b. evidence of Progress
c. Feedback to the
respondent
d. Evaluation of teaching
or curriculum
Primary Functions of Language Assessment
3. Research Purposes
a. evaluation
b. Experimentation
c. Knowledge about the language
learning and the language use
Proficiency tests are intended for
administrative purposes.
Achievement tests are intended for
assessing instructional outcomes.
1. Norm-referenced
Assessment
Distinctions in Testing
2. Criterion-referenced
Assessment
- A test can be used to
compare a respondent with
other respondents whether
locally, regionally, or
nationally.
- A test can be used to see
whether a respondent has met
a certain instructional
objective or criteria.
Components of Communicative Competence
1. Grammatical Competence
- encompasses knowledge of lexical items and of rules of
morphology, syntax, sentence-grammar semantics, and phonology
(Canale and Swain 1980)
2. Discourse Competence
- The ability to connect sentences in stretches of discourse
and to form a meaningful whole out of a series of utterances.
4. Strategic Competence
- The verbal and nonverbal communication strategies
that may be called into action to compensate for breakdowns
in communication due to performance variables or due to
insufficient competence.
Components of Communicative Competence
3. Sociolinguistics Competence
- involves knowledge of the sociocultural rules of
language
Prepared by:
Hilda D. Carreon
ASSESSING
LANGUAGE SKILLS
Methods of Testing Reading
•Learners use a certain TYPE(S) of READING
•Comprehend at a certain level or combination of
LEVELS OF MEANING
•Enlist a certain COMPREHENSION SKILL(S)
•And do all of this within the framework of a certain
TESTING OF METHOD(S)
I. TYPES OF READING
A. Skimming or Scanning
A distinction has been made between scanning and
search reading
Search reading – the respondent is scanning
without being sure about the form that
information will take (i.e., whether it will be a
WORD, PHRASE, SENTENCE, PASSAGE ,
and so on)
B. Read Receptively and
Read Responsibly
Read responsibly
- written materials prompts them to reflect on some
point or other, and then possibly respond in writing
Read Receptively
- discovering accurately what the author seeks
to convey
II. Levels of Meaning
o Grammatical meaning
– meaning that words and morphemes have on their
own
o Propositional meaning
– meaning that a clause or sentence can have on its
own (i.e., the information that the clause or sentence
transmits)
- this meaning is also referred to as its
“INFORMATIONAL VALUE”
Four (4 )Levels of Meaning:
o Discoursal meaning
– meaning a sentence can have only when in a context
– This meaning is also referred to us its
“FUNCTIONAL VALUE”
o Writer’s Intent
- the meaning that a sentence has only as part of the
interaction between writer and reader
- “author’s tone”
III. COMPREHENSION SKILLS
(Alderson 1987)
(i) The ability to recognize words and phrases of
similar and opposing meaning
(ii) Identifying or locating information
(iii)Discriminating elements or features within
context: the analysis of elements within a
structure and of the relationship among them -
--
e.g., causal, sequential, chronological, hierarc
hical
(iv) Interpreting of complex
ideas, actions, events, relationships
(v) Inferencing– deriving conclusions and
predicting the continuation
(vi) Synthesis
(vii) Evaluation
A. The Cloze and the C- Test
The Cloze Test
 One-or-two-word deletions
 Rational deletion
 Partial deletion from the beginning end of words
C - Test
( Klein-Braley and Raatz
- The second half of every other word is deleted, leaving the first and
last sentence of the passage intact
IV. TESTING METHOD
B. Computerized Adaptive Testing (CAT)
-The selection and sequenced items depends on
the pattern of success and failure experienced by
the student.
Advantages:
Individual testing time may be reduced
Illustration and fatigue are minimized
Boredom is reduced
Test scores and diagnoses feedback may be provided
immediately
Test security may be enhanced (since it is unlikely
that two respondents would receive the same items
in the same sequenced)
Record-keeping functions are improved
Information is readily available for research
purposes (Larson and Madsen 1985, Madsen
1986)
Disadvantage:
 CAT presumes that one major language factor
or underlying test is being measured as a time
C. Communicative Test of Reading
Comprehension
-Canale (1984) points out that a good test is not just
one that is acceptable -- that is, accepted as fair,
important, and interesting by test takers and test
users
-Also, a good test has feedback potential, rewarding
both test takers as test users with clear, rich,
relevant and generalizable information
Storyline Test – test with a thematic line of
development
“Proficiency-oriented achievement test”
Canale (1985)
 such tests put to use what is learned. There is a transfer
from controlled training to real performance.
 there is a focus on the message and the function, not
just on the form
 there is a group collaboration as well as individual
work, not just the latter
 the respondents are called upon to use their
resourcefulness in resolving authentic problems in
language use, as opposed to demonstrating accuracy in
resolving contrived problems at the linguistic level
 the testing itself is more like learning, and the learners
are more involved in the assessment.
Prepared by:
Deiniol Audbert L. Garces
TEST CONSTRUCTION
AND ADMINISTRATION
Inventory of Objects
• Test constructors first make an inventory of the
objectives they want to test
• Distinguish broad objectives with more specific ones
and important objectives from trivial ones
• Varying the type of items or procedures testing a
particular objective helps distinguish one student’s
comprehension from that of another student.
Inventory of Objects
• Testers may need to resist the temptation
to include difficult items of marginal
importance simply because they
distinguish the better and poorer
achievers.
Constructing an Item Bank
1. The skill or combination of skills tested
2. The language element(s) involved
3. The item-elicitation and item-response
formats
4. Instructions on how to present the item
5. The section of the book or part of the course
that the item relates to
6. The time it took to write the item
Test Format
• An effective way to hold the interest of a
respondent towards a test is to start the test
with relatively easy items and then continue it
by interspersing easy and difficult items.
• Multiple choice items lend respondents to
guessing.
Instructions
• The instructions should be brief and yet explicit and
unambiguous
• Examples may help, but on the other hand, may
hinder if they do not give the whole picture and
become a substitute for reading instructions.
• The time allowed for each subtest and/or for the total
test should be announced
Scoring
• If an objective is tested by more than one item, then
the focus is on mastery of the objective.
• Items covering one objective may be weighted more
than the items covering another objective.
• Ex. The scoring of a multiple-choice test would be
considered more objective than that of an essay test,
where the scorer’s subjectivity plays more of a role.
Reliability
Three Factors:
1. Test Factors
2. Situational Factors
3. Individual Factors
Validity
1. Face Validity
2. Content Validity
3. Criterion-Related Validity
4. Construct Validity
5. Convergent Validity
Item Analysis
1. Piloting the Test – sound testing on a population
similar to that for which the test is designed
2. Item Difficulty – proportion of correct responses to
a test item
3. Item Discrimination – how well an item performs in
separating the better students from the poorer ones
Test Revision
• An item should be revised or eliminated if it
has low difficulty or discrimination
coefficient
• If distractors (multiple item options) draw no
response or too many, then they should be
omitted or altered.
• Results of item analysis should be added to
the information in the item bank.
L2 assessment

L2 assessment

  • 1.
  • 2.
    Misuses of Tests •Tests were used as a punishment. • Tests were administered instead of teachers giving instructions. • The tests were the only measure for grading. • Test did not reflect what was taught. • The tests were returned with lack of corrections or explanations. • The tests reflected only one testing method. • There was a lack of teacher confidence in their own tests. • Students were not adequately trained to take the results. • There was a substantial delay in returning the tests.
  • 3.
    A more constructiveway of language testing exists when: • Testing is seen as an opportunity for interaction between teacher and student. • Students are judged based on the knowledge they have. • The tests are intended for students to improve their skills • The criteria for success on the test are clear to students • Students receive a grade for their performance on a set of tests representing different testing methods. • The test takers are trained on how to take tests especially those involving unfamiliar formats. • The tests are returned promptly. • The results are discussed.
  • 4.
    Prepared by: Marie JoyM. Anhaw THEORETICAL FOUNDATIONS
  • 5.
    Primary Functions ofLanguage Assessment 1. Administrative a. assessment b. placement c. exemption d. certification e. promotion 2. Instructional a. diagnosis b. evidence of Progress c. Feedback to the respondent d. Evaluation of teaching or curriculum
  • 6.
    Primary Functions ofLanguage Assessment 3. Research Purposes a. evaluation b. Experimentation c. Knowledge about the language learning and the language use
  • 7.
    Proficiency tests areintended for administrative purposes. Achievement tests are intended for assessing instructional outcomes.
  • 8.
    1. Norm-referenced Assessment Distinctions inTesting 2. Criterion-referenced Assessment - A test can be used to compare a respondent with other respondents whether locally, regionally, or nationally. - A test can be used to see whether a respondent has met a certain instructional objective or criteria.
  • 9.
    Components of CommunicativeCompetence 1. Grammatical Competence - encompasses knowledge of lexical items and of rules of morphology, syntax, sentence-grammar semantics, and phonology (Canale and Swain 1980) 2. Discourse Competence - The ability to connect sentences in stretches of discourse and to form a meaningful whole out of a series of utterances.
  • 10.
    4. Strategic Competence -The verbal and nonverbal communication strategies that may be called into action to compensate for breakdowns in communication due to performance variables or due to insufficient competence. Components of Communicative Competence 3. Sociolinguistics Competence - involves knowledge of the sociocultural rules of language
  • 11.
    Prepared by: Hilda D.Carreon ASSESSING LANGUAGE SKILLS
  • 12.
    Methods of TestingReading •Learners use a certain TYPE(S) of READING •Comprehend at a certain level or combination of LEVELS OF MEANING •Enlist a certain COMPREHENSION SKILL(S) •And do all of this within the framework of a certain TESTING OF METHOD(S)
  • 13.
    I. TYPES OFREADING A. Skimming or Scanning A distinction has been made between scanning and search reading Search reading – the respondent is scanning without being sure about the form that information will take (i.e., whether it will be a WORD, PHRASE, SENTENCE, PASSAGE , and so on)
  • 14.
    B. Read Receptivelyand Read Responsibly Read responsibly - written materials prompts them to reflect on some point or other, and then possibly respond in writing Read Receptively - discovering accurately what the author seeks to convey
  • 15.
    II. Levels ofMeaning o Grammatical meaning – meaning that words and morphemes have on their own o Propositional meaning – meaning that a clause or sentence can have on its own (i.e., the information that the clause or sentence transmits) - this meaning is also referred to as its “INFORMATIONAL VALUE” Four (4 )Levels of Meaning:
  • 16.
    o Discoursal meaning –meaning a sentence can have only when in a context – This meaning is also referred to us its “FUNCTIONAL VALUE” o Writer’s Intent - the meaning that a sentence has only as part of the interaction between writer and reader - “author’s tone”
  • 17.
    III. COMPREHENSION SKILLS (Alderson1987) (i) The ability to recognize words and phrases of similar and opposing meaning (ii) Identifying or locating information (iii)Discriminating elements or features within context: the analysis of elements within a structure and of the relationship among them - -- e.g., causal, sequential, chronological, hierarc hical
  • 18.
    (iv) Interpreting ofcomplex ideas, actions, events, relationships (v) Inferencing– deriving conclusions and predicting the continuation (vi) Synthesis (vii) Evaluation
  • 19.
    A. The Clozeand the C- Test The Cloze Test  One-or-two-word deletions  Rational deletion  Partial deletion from the beginning end of words C - Test ( Klein-Braley and Raatz - The second half of every other word is deleted, leaving the first and last sentence of the passage intact IV. TESTING METHOD
  • 20.
    B. Computerized AdaptiveTesting (CAT) -The selection and sequenced items depends on the pattern of success and failure experienced by the student.
  • 21.
    Advantages: Individual testing timemay be reduced Illustration and fatigue are minimized Boredom is reduced Test scores and diagnoses feedback may be provided immediately Test security may be enhanced (since it is unlikely that two respondents would receive the same items in the same sequenced)
  • 22.
    Record-keeping functions areimproved Information is readily available for research purposes (Larson and Madsen 1985, Madsen 1986) Disadvantage:  CAT presumes that one major language factor or underlying test is being measured as a time
  • 23.
    C. Communicative Testof Reading Comprehension -Canale (1984) points out that a good test is not just one that is acceptable -- that is, accepted as fair, important, and interesting by test takers and test users -Also, a good test has feedback potential, rewarding both test takers as test users with clear, rich, relevant and generalizable information Storyline Test – test with a thematic line of development
  • 24.
    “Proficiency-oriented achievement test” Canale(1985)  such tests put to use what is learned. There is a transfer from controlled training to real performance.  there is a focus on the message and the function, not just on the form  there is a group collaboration as well as individual work, not just the latter  the respondents are called upon to use their resourcefulness in resolving authentic problems in language use, as opposed to demonstrating accuracy in resolving contrived problems at the linguistic level  the testing itself is more like learning, and the learners are more involved in the assessment.
  • 25.
    Prepared by: Deiniol AudbertL. Garces TEST CONSTRUCTION AND ADMINISTRATION
  • 26.
    Inventory of Objects •Test constructors first make an inventory of the objectives they want to test • Distinguish broad objectives with more specific ones and important objectives from trivial ones • Varying the type of items or procedures testing a particular objective helps distinguish one student’s comprehension from that of another student.
  • 27.
    Inventory of Objects •Testers may need to resist the temptation to include difficult items of marginal importance simply because they distinguish the better and poorer achievers.
  • 28.
    Constructing an ItemBank 1. The skill or combination of skills tested 2. The language element(s) involved 3. The item-elicitation and item-response formats 4. Instructions on how to present the item 5. The section of the book or part of the course that the item relates to 6. The time it took to write the item
  • 29.
    Test Format • Aneffective way to hold the interest of a respondent towards a test is to start the test with relatively easy items and then continue it by interspersing easy and difficult items. • Multiple choice items lend respondents to guessing.
  • 30.
    Instructions • The instructionsshould be brief and yet explicit and unambiguous • Examples may help, but on the other hand, may hinder if they do not give the whole picture and become a substitute for reading instructions. • The time allowed for each subtest and/or for the total test should be announced
  • 31.
    Scoring • If anobjective is tested by more than one item, then the focus is on mastery of the objective. • Items covering one objective may be weighted more than the items covering another objective. • Ex. The scoring of a multiple-choice test would be considered more objective than that of an essay test, where the scorer’s subjectivity plays more of a role.
  • 32.
    Reliability Three Factors: 1. TestFactors 2. Situational Factors 3. Individual Factors
  • 33.
    Validity 1. Face Validity 2.Content Validity 3. Criterion-Related Validity 4. Construct Validity 5. Convergent Validity
  • 34.
    Item Analysis 1. Pilotingthe Test – sound testing on a population similar to that for which the test is designed 2. Item Difficulty – proportion of correct responses to a test item 3. Item Discrimination – how well an item performs in separating the better students from the poorer ones
  • 35.
    Test Revision • Anitem should be revised or eliminated if it has low difficulty or discrimination coefficient • If distractors (multiple item options) draw no response or too many, then they should be omitted or altered. • Results of item analysis should be added to the information in the item bank.