ASSESSING
LANGUAGE LEARNING
Applied Linguistics for Language Teachers
WS 2015-2016, Lecture 11
Dr. Achilleas Kostoulas
(w/ thanks to Dr. Nancy Campbell for materials)
Lecture Outline
• Types of assessment
• Features of “good” assessment
• The Austrian Centralised Matura
• The effect of testing
TYPES OF ASSESSMENT
i. Summative vs. Formative
ii. Achievement vs. Proficiency
iii. Standardised vs. Non-standardised
iv. Aptitude
Summative vs. formative testing
Summative test
evaluates student learning at
the end of an instructional unit
by comparing it to a set
standard
Usually high stakes
e.g., a midterm exam, a paper,
a final project
Formative test:
• gives information about a
learner’s progress during a
course and provides ongoing
feedback
• Usually low stakes
• Can also have a diagnostic
function
• e.g., a research proposal, a
concept map drawn in class;
“Lernzielkontrolle“
5
Achievement vs. proficiency testing
Achievement tests
• Based on curriculum / syllabus
• Provide information to
teachers, students and parents
regarding student progress
• Can be used to rate quality of
instruction
• Can be standardised or non-
standardised
Proficiency tests
• Measure ability to use
language, but not necessarily
linked to any
syllabus/curriculum
• Provide information that helps
make decisions about future
performance (e.g., in a job, or
university course, migration)
• Are always standardised
6
Standardised vs. Non-Standardised
Non-standardised
• Tests or exams that are
devised, usually by a class
teacher, on an ad hoc basis, for
an assessment purpose (e.g.,
to award grades).
• Format varies
• May be marked using grading
schemes.
• Can be used to compare
students within the same
group, but not across groups
Standardised
• Exams and tests devised
according to known standards.
• Require extensive pre-testing
and statistical expertise, so
they are usually designed by
testing agencies or educational
organisations
• Usually similar format in all
sittings
• Marking done according to
grading scheme and/or
benchmarks
• Can be used to compare
students across groups
7
Aptitude tests
• Used for prediction purposes, e.g. musical ability, sport.
• Can be high stake if entrance to musical or sports training
is dependant on an aptitude test.
ASSESSMENT FEATURES
i. Validity
ii. Reliability
iii. Discriminatory function
iv. Practicality
v. Pedagogical utility
Validity
• The degree to which a test actually measures what it is
thought to measure
Can you think of any examples
of tests that do not fit
this criterion?
Reliability
• The degree to which a test accurately measures what it is
said to measure, i.e., that measurement is not influenced
by outside conditions, fluctuations in learner affect,
marker effects etc.
Discriminatory function
• The ability of the test to produce scores that help us to
distinguish between successful and not-so-successful
learners
• Especially important in proficiency testing, maybe less so
in achievement testing
Practicality
• The degree to which the test can be designed
administered and marked without taking away resources
that are better used in teaching
Pedagogical utility
• The extent to which the test can produce useful data for
teaching, and useful attitudes among learners, and the
extent to which it promotes desirable teaching practices
(positive washback)
THE CENTRALISED
MATURA
Centralised Austrian Matura: skills-based
Includes sections that test:
•Listening
•Reading
•Writing
•Language in use
What is not
included
in this list?
Centralised Austrian Matura: competence-oriented
Sample writing task
What aspects of CLT
do you recognise
in this task?
Aims of centralised Matura
• National and international transparency and comparability
• Fair and objective assessment
• Quality guarantee
18
Ziele der neuen schriftlichen Reife- und
Diplomprüfung (www.bifie.at)
Die Einführung der standardisierten schriftlichen Reife- und
Diplomprüfung macht Prüfungsanforderungen in wesentlichen
Bereichen bundesweit transparent und vergleichbar. Sie garantiert
Schülerinnen und Schülern durch verbindliche Beurteilungsrichtlinien
mehr Fairness und Objektivität in der Beurteilung ihrer Leistungen an
einer zentralen Schnittstelle in ihrer Bildungslaufbahn. Tertiären
Institutionen und künftigen Arbeitgebern bietet sie zuverlässigere
Aussagen über die tatsächlich erworbenen Kompetenzen der
Schulabgänger/innen. Universitäten, Fachhochschulen und
Arbeitgeber in Österreich und Europa können sich in Zukunft besser
auf die Ausbildungsqualität der österreichischen Maturantinnen und
Maturanten in zentralen Bereichen verlassen.
Discussion task (1)
Previously we talked about the features of good
assessment, namely:
•Validity
•Reliability
•Discriminatory function
•Practicality
•Pedagogical utility
In your opinion, to what extent does the Centralised Matura
fulfill these criteria?
Discussion task (2)
• Why has there been so much resistance in Austria to the
centralised Matura?
THE EFFECTS OF
TESTING
Problems with standardized testing (Robinson,
2009)
has become a massive commercial industry
usually doesn’t address personal development
It’s not there to identify what individuals can do. It’s there to look at things
to which they can conform.
Personalization is lost in the culture of
standardizing.
Objective assessment scales are inappropriate for
measuring human skills influenced by emotions
and motivations.
Negative washback effect
Washback effect (Shawcross, 2002)
Definition: the (positive or negative) influence that an exam
has on the way in which students are taught
There is a natural tendency for both teachers and students to
tailor their classroom activities to the demands of the test,
especially when the test is very important to the future of the
students, and pass rates are used as a measure of teacher
success. This influence of the test on the classroom (referred to
as washback by language testers) is, of course, very important;
this washback effect can be either beneficial or harmful.” (Buck,
1988 in Shawcross 2002)
Positive washback effects
Improvements in
• curriculum
• teaching materials
• teacher training
• learner competence
• Learners “liberated” from subjective judgement
Positive washback effects
“If we test directly the skills that we are interested in
fostering, then practice for the tests represents practice
in those skills.” (Hughes, 1989)
How?
Learning from preparation:
 classroom tasks included in summative tests
 no distinction between communicative classroom tasks and tests
Negative washback effects
• Teaching solely for the test
• Restrictions on materials and methodology
• Curtailment of individual development and creativity
(affecting teacher and learner)
Applied linguistics: Assessment for language teachers

Applied linguistics: Assessment for language teachers

  • 1.
    ASSESSING LANGUAGE LEARNING Applied Linguisticsfor Language Teachers WS 2015-2016, Lecture 11 Dr. Achilleas Kostoulas (w/ thanks to Dr. Nancy Campbell for materials)
  • 2.
    Lecture Outline • Typesof assessment • Features of “good” assessment • The Austrian Centralised Matura • The effect of testing
  • 3.
    TYPES OF ASSESSMENT i.Summative vs. Formative ii. Achievement vs. Proficiency iii. Standardised vs. Non-standardised iv. Aptitude
  • 4.
    Summative vs. formativetesting Summative test evaluates student learning at the end of an instructional unit by comparing it to a set standard Usually high stakes e.g., a midterm exam, a paper, a final project Formative test: • gives information about a learner’s progress during a course and provides ongoing feedback • Usually low stakes • Can also have a diagnostic function • e.g., a research proposal, a concept map drawn in class; “Lernzielkontrolle“
  • 5.
    5 Achievement vs. proficiencytesting Achievement tests • Based on curriculum / syllabus • Provide information to teachers, students and parents regarding student progress • Can be used to rate quality of instruction • Can be standardised or non- standardised Proficiency tests • Measure ability to use language, but not necessarily linked to any syllabus/curriculum • Provide information that helps make decisions about future performance (e.g., in a job, or university course, migration) • Are always standardised
  • 6.
    6 Standardised vs. Non-Standardised Non-standardised •Tests or exams that are devised, usually by a class teacher, on an ad hoc basis, for an assessment purpose (e.g., to award grades). • Format varies • May be marked using grading schemes. • Can be used to compare students within the same group, but not across groups Standardised • Exams and tests devised according to known standards. • Require extensive pre-testing and statistical expertise, so they are usually designed by testing agencies or educational organisations • Usually similar format in all sittings • Marking done according to grading scheme and/or benchmarks • Can be used to compare students across groups
  • 7.
    7 Aptitude tests • Usedfor prediction purposes, e.g. musical ability, sport. • Can be high stake if entrance to musical or sports training is dependant on an aptitude test.
  • 8.
    ASSESSMENT FEATURES i. Validity ii.Reliability iii. Discriminatory function iv. Practicality v. Pedagogical utility
  • 9.
    Validity • The degreeto which a test actually measures what it is thought to measure Can you think of any examples of tests that do not fit this criterion?
  • 10.
    Reliability • The degreeto which a test accurately measures what it is said to measure, i.e., that measurement is not influenced by outside conditions, fluctuations in learner affect, marker effects etc.
  • 11.
    Discriminatory function • Theability of the test to produce scores that help us to distinguish between successful and not-so-successful learners • Especially important in proficiency testing, maybe less so in achievement testing
  • 12.
    Practicality • The degreeto which the test can be designed administered and marked without taking away resources that are better used in teaching
  • 13.
    Pedagogical utility • Theextent to which the test can produce useful data for teaching, and useful attitudes among learners, and the extent to which it promotes desirable teaching practices (positive washback)
  • 14.
  • 15.
    Centralised Austrian Matura:skills-based Includes sections that test: •Listening •Reading •Writing •Language in use What is not included in this list?
  • 16.
    Centralised Austrian Matura:competence-oriented Sample writing task What aspects of CLT do you recognise in this task?
  • 17.
    Aims of centralisedMatura • National and international transparency and comparability • Fair and objective assessment • Quality guarantee
  • 18.
    18 Ziele der neuenschriftlichen Reife- und Diplomprüfung (www.bifie.at) Die Einführung der standardisierten schriftlichen Reife- und Diplomprüfung macht Prüfungsanforderungen in wesentlichen Bereichen bundesweit transparent und vergleichbar. Sie garantiert Schülerinnen und Schülern durch verbindliche Beurteilungsrichtlinien mehr Fairness und Objektivität in der Beurteilung ihrer Leistungen an einer zentralen Schnittstelle in ihrer Bildungslaufbahn. Tertiären Institutionen und künftigen Arbeitgebern bietet sie zuverlässigere Aussagen über die tatsächlich erworbenen Kompetenzen der Schulabgänger/innen. Universitäten, Fachhochschulen und Arbeitgeber in Österreich und Europa können sich in Zukunft besser auf die Ausbildungsqualität der österreichischen Maturantinnen und Maturanten in zentralen Bereichen verlassen.
  • 19.
    Discussion task (1) Previouslywe talked about the features of good assessment, namely: •Validity •Reliability •Discriminatory function •Practicality •Pedagogical utility In your opinion, to what extent does the Centralised Matura fulfill these criteria?
  • 20.
    Discussion task (2) •Why has there been so much resistance in Austria to the centralised Matura?
  • 21.
  • 22.
    Problems with standardizedtesting (Robinson, 2009) has become a massive commercial industry usually doesn’t address personal development It’s not there to identify what individuals can do. It’s there to look at things to which they can conform. Personalization is lost in the culture of standardizing. Objective assessment scales are inappropriate for measuring human skills influenced by emotions and motivations. Negative washback effect
  • 23.
    Washback effect (Shawcross,2002) Definition: the (positive or negative) influence that an exam has on the way in which students are taught There is a natural tendency for both teachers and students to tailor their classroom activities to the demands of the test, especially when the test is very important to the future of the students, and pass rates are used as a measure of teacher success. This influence of the test on the classroom (referred to as washback by language testers) is, of course, very important; this washback effect can be either beneficial or harmful.” (Buck, 1988 in Shawcross 2002)
  • 24.
    Positive washback effects Improvementsin • curriculum • teaching materials • teacher training • learner competence • Learners “liberated” from subjective judgement
  • 25.
    Positive washback effects “Ifwe test directly the skills that we are interested in fostering, then practice for the tests represents practice in those skills.” (Hughes, 1989) How? Learning from preparation:  classroom tasks included in summative tests  no distinction between communicative classroom tasks and tests
  • 26.
    Negative washback effects •Teaching solely for the test • Restrictions on materials and methodology • Curtailment of individual development and creativity (affecting teacher and learner)