7.1 assessment and the cefr (1)

What does the word suggest?
What sort of emotions does it convey?
Try to write a definition. What does it
imply?
Which characteristics should it have?

 What does the word suggest?
 What sort of emotions does it convey?
 Try to write a definition. What does it imply?
• Collecting information
• Analyzing the information and making an assessment
• Taking decisions according to the assessment made:
 Pedagogical decisions (formative assessment)
 Social decisions
 Which characteristics should it have?
• Validity, reliability, feasibility

 Assessment: Assessment of the proficiency of
the language user
 3 key concepts:
• Validity: the information gained is an accurate
representation of the proficiency of the candidates
• Reliability: A student being tested twice will get the
same result (technical concept: the rank order of the
candidates is replicated in two separate—real or
simulated—administrations of the same assessment )
• Feasibility: The procedure needs to be practical,
adapted to the available elements and features

 If we want assessment to be valid, reliable,
and feasible, we need to specify:
• What is assessed: according to the CEFR,
communicative activities (contexts, texts, and tasks).
See examples.
• How performance is interpreted: assessment criteria.
See examples
• How to make comparisons between different tests
and ways of assessment (for example, between public
examinations and teacher assesment). Two main
procedures:
 Social moderation: discussion between experts
 Benchmarking: comparison of samples in relation to
standardized definitions and examples, which become
reference points (benchmarks)
• Guidelines for good practice: EALTA

TYPES OF ASSESSMENT
1 Achievement assessment / Proficiency assessment
2 Norm-referencing (NR)/ Criterion-referencing (CR)
3 Mastery learning CR / Continuum CR
4 Continuous assessment / Fixed assessment points
5 Formative assessment / Summative assessment
6 Direct assessment / Indirect assessment
7 Performance assessment / Knowledge assessment
8 Subjective assessment / Objective assessment
9 Checklist rating / Performance rating
10 Impression / Guided judgement
11 Holistic assessment/ Analytic assessment
12 Series assessment / Category assessment
13 Assessment by others / Self-assessment

Types of tests:
• Proficiency tests
• Achievement tests. 2 approaches:
 To base achievement tests on the textbook/syllabus
(contents)
 To base them on course objectives. More beneficial
washback.
• Diagnostic tests
• Placement tests

 Validity: the information gained is an accurate
representation of the proficiency of the
candidates
 Validity Types:
• Construct validity (very general, the information gained
is an accurate representation of the proficiency of the
candidate. It checks the validity of the construct, the
thing we want to measure)
• Content validity. This checks it the test’s content is a
representative sample of the skills or structures that it
wants to measure. In order to check this we need a
complete specification of all the skills or structures we
want to cover. If it covers 5% only, it has less content
validity than if it covers 25 %.

 Validity Types:
• Criterion-related validity: Results on the test agree with
other dependable results (criterion test)
 Concurrent validity. We compare the test results with the
criterion test.
 Predictive validity. The test predicts future performance.A
placement test is validated by the teachers who teach the
selected students.
• Validity in scoring. Not only the items need to be valid,
but also the way in which responses are scored
(taking into account grammar mistakes in a reading
comprehension exam is not valid)
• Face validity: the test has to look as if it measures
what it is supposed to measure. A written test to check
pronunciation has little face validity.

How to make tests more valid (Hughes)
Write specifications for the test.
Include a representative sample ot the
content of the specifications in the text
Whenever feasible, use direct testing
Make sure that the scoring relates directly
to what is being tested
Try to make the test reliable

Reliability: A student being tested twice will get the same
result (technical concept: the rank order of the candidates
is replicated in two separate—real or simulated—
administrations of the same assessment. Result: a
reliability coefficient, theoretical maximum 1, if all the
students get exactly the same result)
- We compare two tests. Methods:
- Test-Retest: the student takes the same test again
- Alternate Forms: the students take two alternate forms
of the same test
- Split.Half: you split the test into two equivalent halves
and compare them as if they were two different tests.

- Reliability coefficient / Standard Error of Measurement
A High Stakes Test needs a high reliability coefficient
(highest is 1), and therefore a very low standard error of
measurement (a number obtained by statistical
analysis). A Lower Stakes exam does not need those
coefficients.
- True Score: the real score that a student would get in a
perfectly reliable test. In a very reliable test, the true
score is clearly defined (the student will always get a
similar result, for example 65-67). In a less reliable test,
the range is wider (55-75).
- Scorer reliability (coefficient). You compare the scores
given by different scorers (examiners). The more
agreement, the more reliable their reliability coefficient.

Item analysis:
 Facility value
 Discrimination indices: drop some, improve
others
 Analyse distractors
 Item banking

1.Take enough samples of behaviour.
2.Exclude items which do not descriminate well
3.Do not allow candidates too much freedom.
4.Write unambiguous items
5.Provide clear and explicit instructions
6.Ensure that tests are well laid out and perfectly
legible
7.Make candidates familiar with format and testing
techniques
8.Provide uniform and non-distracting conditions of
administration

9. Use items which permit scoring which is as
objective as possible
10. Make comparisons between candidates as direct
as possible
11. Provide a detailed scoring key
12. Train scorers
13. Agree acceptable responses and appropriate
scores at the beginning of the scoring process.
14. Identifty candidates by number not by name
15. Employ multiple, independent scorers..

To be valid a test must be reliable (provide
accurate measurement)
A reliable test may not be valid at all
(technically perfect, but globally wrong: it
does not test what it is supposed to test)

 Test the abilities/skills you want to encourage.
 Sample widely and unpredictably
 Use direct testing
 Make testing criterion-referenced (CEFR)
 Base achievement tests on objectives
 Ensure that the test is known and understood by
students and teachers
 Counting the cost

1. Make a full and clear statement of the testing
‘problem’.
2. Write complete specifications for the test.
3. Write and moderate items.
4. Trial the items informally on native speakers and
reject or modify problematic ones as necessary.
5. Trial the test on a group of non-native speakers
similar to those for whom the test is intended.
6. Analyse the results of the trial and make any
necessary changes.
7. Calibrate scales: collect samples of performance,
use them as models (benchmarking)
8. Validate.
9. Write handbooks for test takers, test users and
staff.
10. Train any necessary staff (interviewers, raters,
etc.).

Chapters from Hughes’ Testing for Language Teachers
8. Common Test techniques: Elaine, 24th
9. Testing Writing: Marta, Idoia, 22nd
10. Testing Oral Abilities: Paula, Ángela, 24th
11. Testing Reading: Lucía, 24th
12. Testing Listening: Lorena, 22nd
13. Testing Grammar and Vocabulary: Clara, Cristina,
22nd
14. Testing Overall Ability: Jefferson, 22nd
15. Tests for Young Learners: Tania, Diego, 24th

7.1 assessment and the cefr (1)

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Viewers also liked

Viewers also liked (9)

Similar to 7.1 assessment and the cefr (1)

Similar to 7.1 assessment and the cefr (1) (20)

More from Jesús Ángel González López

More from Jesús Ángel González López (20)

Recently uploaded

Recently uploaded (20)

7.1 assessment and the cefr (1)

Editor's Notes