1. LANGUAGE TESTING
A Course Presentation by:
Dr. Jihan Zayed
Mustaqbal University, KSA
2019
Arthur Huges (2001). Language Testing For Teachers.
Cambridge University Press
2. Outline
Kinds of Testing
2
Approaches to Testing
3
Validity and Reliability
4
Achieving beneficial backwash
5
Stages of Test Construction
6
Test Techniques for Testing Overall Ability
7
Teaching and Testing
1
Testing of Language skills
8
Testing grammar and vocabulary
9
Test administration
10
3. An accurate test must be:
1-Valid
• A valid test measures accurately what it is intended to
measure. For example, if we want to test writing, we have
to ask our students to write not to read, for instance.
2-Reliable
• A reliable test provides consistent results no matter how
many times a student takes it. For example, a student takes
approximately the same score, whether s/he repeats the test
on a particular day or the next.
3
4. Invalidity has 2 origins:
4
Test
Content
Test
Techniques
For knowing how well
students can write, there
is absolutely no way we
can get a really accurate
measure of their ability
by means of a multiple-
choice test.
5. Unreliability has 2 origins:
Features of the test
• Unclear instructions
• Ambiguous questions
• Easily-guessed answers
Scoring
• The same composition may
be given different scores by
different markers or by the
same marker on different
occasions.
5
6. Language Testing
• The need for language tests:
• Testing a language is a structured attempt to measure what can
students do in, or with, a language. It is important, for example, for:
1. accepting students from overseas to study in, for example, British
and American universities;
2. hiring translators or interpreters in different organizations; and
3. getting information about the achievement of groups of learners.
• What is to be done?
• The teaching profession can contribute to the improvement of
testing through:
1. They can write better tests themselves.
2. They can put pressure on others to improve their tests such as
when a writing test (Test of Written English) was added to
supplement TOEFL (Test of English as a Foreign English), the test
taken by most non-native speakers of English applying to North
American universities.
6
7. Testing as problem solving
• No Best Test
A test which proves ideal for one purpose may be useless for
another. That is, there must be specified objectives before
designing a test.
• Tests should …
1. consistently and accurately measure the abilities to be
measured;
2. have a beneficial effect on teaching; and
3. be practical – economical in terms of time and money.
7
8. Kinds of Tests and Testing
Language Tests
Design (Method)
Paper-and-Pen Tests Performance Tests
Purpose
Proficiency Tests Achievement Tests
Diagnosis Tests Placement Tests
8
9. Kinds of Tests: (Method of Testing)
• Paper-and-pen Tests are typically used for the assessment of:
1. Separate components of language (grammar, vocabulary …)
2. Receptive skills (listening and reading)
• Performance Tests assess language skills in the act of communication (i.e.,
productive skills: speaking and writing) where:
1. extended samples of speech/writing are elicited;
2. judged by trained markers; and
3. common rating procedure are used.
9
10. Kinds of Tests: (Purpose of Testing)
• Proficiency Tests measure a students’ general ability in the language
regardless of any training they had in that language, rather what they have
to do in the language.
• Achievement Tests discover how far the students have achieved of the
objectives of a course of study.
• Diagnostic Tests identify the students’ strengths and weaknesses to
ascertain what further teaching is necessary.
• Placement Tests assist placement of students by identifying the stage or a
level of a teaching programme most appropriate to their abilities.
10
11. Achievement Tests:
1. Final Achievement Test:
• Administered at the end of a course of study
• Intended to measure course contents and/or objectives
2. Progress Achievement Test:
• Administered repeatedly during a course of study
• Measures the progress the students are making
towards course objectives
• Increasing scores indicate the progress being made
• Establishes a series of well-defined short-term
objectives on which to test or quiz the students
11
12. 12
Direct vs. Indirect Testing
Norm-referenced vs.
Criterion-referenced Testing
Discrete Point vs. Integrative Testing.
You can simply impress your
audience and add a unique
zing and appeal to your
Presentations.
Your Text Here
01 02
03 04
Approaches to Testing
Objective Testing vs.
Subjective Testing
13. Direct vs. Indirect Testing
• Direct Testing:
• It requires the test taker to perform precisely the skill we wish to
measure. For example, if we want to know how well students write
essays, then we ask them to write an essay.
• It is easier to carry out with productive skills speaking and writing; while
with reading and listening, students have to demonstrate that they have
done this.
• It is easy to create the conditions eliciting the skills to be measured.
• It gives helpful backwash effect because practice for the test involves
practice of the skills to be measured.
• Indirect Testing
• It attempts to measure the sub-skills which underlie main skills, for
example, writing skill (e.g. vocabulary, grammatical structures).
• It is dangerous as the mastery of the underlying sub-skills does not always
lead to the mastery of main skills.
• In 1961, Lado measured pronunciation by a paper-and-pencil test in which
the candidate has to identify pairs of words which rhyme with each other.
13
14. Discrete Point vs. Integrative Testing
• Discrete Point Testing:
• It entails testing one element at a time, item by item.
• For ex., a series of items each testing a particular grammatical
structure.
• Integrative Testing:
• It requires the test taker to combine many language elements
for the completion of a task.
• For ex., writing a composition, making notes in a lecture, taking
a dictation, etc.
14
15. Norm-referenced vs. Criterion-referenced Testing
• Norm-referenced Testing:
• places a student in a percentage category;
• relates one candidate’s performance to that of other
candidates; and
• seeks a bell-shaped curve of student assessment.
• Criterion-referenced Testing:
• sets meaningful standards for students to measure their
progress – these standards do not change with different groups
of students;
• classifies students according to what they can actually do with
the language; and
• motivates students to perform “up-to-standard” rather than
trying to be “better” than other students.
15
16. Objective Testing vs. Subjective Testing
• Objective Testing:
• No Judgement is required on the part of the scorer.
• For example, Multiple Choice, Fill-in-the-blank, True or False,
Match, … etc.
• Subjective Testing
• Judgement is required on the part of the scorer.
• Different degrees of subjectivity in scoring.
• Complexity increases subjectivity, for example, the scoring of a
composition is more subjective than short-answer responses.
• The less subjective the scoring, the greater agreement will be
between the two different scorers and between the scores of
one person scoring the same test paper on different occasions.
16
17. Types of Validity
• Content Validity:
• The test would have content validity only if it included a representative
sample of all the language skills, structures, vocabulary, etc. with which it is
intended to test.
• A comparison of test specifications and test content is the basis for this type.
• Criterion-related Validity:
• Where the results of a test agree with those provided by an independent and
highly dependable assessment of the candidate’s ability. This can be
concurrent or predictive. The former refers to the time of administration
while the latter refers to the prediction of candidates’ future performance.
• Construct Validity:
• A construct refers to an underlying trait or ability hypothesized in language
learning theory. It is an important consideration in indirect testing of main
skills or the testing of sub-skills like guessing the meaning of unknown words
as a sub-skill of reading.
• Face Validity:
• A test has face validity if it seems as if it is measuring what it is supposed to
be measuring.
17
18. Types of Reliability
• Test Reliability:
• A student’s score on a test will be approximately the same no matter
how many times s/he takes it.
• Scorer Reliability:
• When the test is objective, the scoring requires no judgment on the
part of the scorer, and the scores should always be the same.
• When the test is subjective, the scoring requires judgment on the part
of the scorer, and the scores will not be the same on different
occasions.
• A scorer would give the same score on different two occasions and
this would be the same as given by another one on either occasion.
18
19. How to Make Tests More Reliable!
• Test for enough independent samples of behavior and allow for as many fresh starts as
possible.
• Do not allow test takers too much freedom. Restrict and specify their range of possible
answers.
• Write unambiguous items.
• Provide clear and explicit instructions.
• Ensure that tests are well laid out and perfectly legible.
• Make sure candidates are familiar with format and test-taking procedures.
• Provide uniform and non-distracting conditions of administration.
• Use items that permit scoring which is objective as possible.
• Make comparisons between candidates as direct as possible.
• Provide a detailed scoring key.
• Train scorers.
• Agree on acceptable responses and appropriate scores at the outset of scoring.
• Identify test takers by number, not name.
• Employ multiple, independent scoring.
19
20. Achieving Beneficial Backwash
• Test abilities whose development we want to encourage.
• Sample widely and unpredictably.
• Use both direct and indirect testing.
• Make testing criterion-referenced.
• Base achievement tests on objectives.
• Ensure test is known and understood by both teachers and
students.
• Provide assistance to teachers.
• Count the cost.
20
21. Stages of Test Construction
Statement of the Problem
Providing a Solution to the Problem
21
22. Statement of the Problem
Be clear about what one wants to know and why! The
following questions have to answered:
• What kind of test is most appropriate?
• What is the precise purpose?
• What abilities are to be tested?
• How detailed must the results be?
• How accurate must the results be?
• How important is backwash?
• What constraints are set by unavailability of expertise,
facilities, time [for construction, administration, and
scoring)?
22
23. Providing a Solution to the Problem
• Once the problem is clear, then steps can be taken to solve it.
• Efforts should be made to gather information on similar tests
designed for similar situations. If possible, samples of these
tests should be obtained. As each testing situation is unique,
they should not be copied, but rather used to suggest
possibilities.
23
24. 1. Writing Specifications for the Test
• The first form that the solution take is a set of specifications for
the test. They include:
24
Test Specifications
Content
Operations Types of Text
Addressees Topics
Format and
Timing
Criterial Levels
of Performance
Scoring
Procedures
25. 1. Content refers not to the content of a single, particular version of the test,
but to the entire potential content of any number of versions. The content
should be specified regarding:
• Operations: The tasks students will have to be able to carry out (e.g. For reading:,
skim, scan, guess, etc.).
• Types of Text: A writing test might include: letters, forms, academic essays, etc.).
• Addressees: the people the test-taker is expected to be able to speak or write to; or
the people for whom reading and listening are primarily intended (for example,
native-speaker university students).
• Topics: topics should be selected according to their suitability for the test takers and
the type of test.
2. Format and Timing should specify test structure and item types/elicitation
procedures, with examples. It should state what weighting to be allocated to
each component. It should say how many passages be presented or required
and how many items will be in each component.
3. Criterial Levels of Performance: The required levels of performance for
different levels of success should be specified. For example, to demonstrate
mastery, 80 % of the items must be responded to correctly. It may entail a
complex rubric including the following: accuracy, appropriacy, range of
expression, flexibility, size of utterances.
4. Scoring Procedures: These are most relevant when scoring is subjective. 25
26. 2. Writing the Test
Sampling
• Choose widely from whole area of content. Succeeding versions of test should
sample widely and unpredictably.
Item Writing and Moderation
•Some items will have to be rejected – others reworked.
•Best way is through teamwork!
•Item writers must be open to, and ready to accept criticism. Critical questions
must be asked
•Is the task perfectly clear?
•Is there more than one possible correct answer?
•Do test takers have enough time to perform the tasks?
Writing and Moderation of Scoring Key
• When there is only one correct response, this is quite straightforward.
• When there are alternative acceptable responses, which may be awarded different
scores, or where partial credit may be given for incomplete responses, greater care
26
27. 3. Pretesting
• The aim should be to administer the test first to a group as
similar as to the one for which it is really intended.
• Problems in administration and scoring are noted.
• The reliability coefficient of the whole test and of its
components are calculated, and individual items are
analyzed.
27
28. Test Techniques for Testing Overall Ability
• Test Techniques are means of eliciting behavior from test
takers which inform us about their language abilities.
• We need test techniques which:
• elicit valid and reliable behavior regarding ability in
which we are interested;
• will elicit behavior which will be reliably scored;
• are economical; and
• have a positive backwash effect.
28
29. TOEFL IBT
• TOEFL stands for Test of English as a Foreign Language – Internet-
based Test.
• It measures the ability of nonnative speakers of English to use and
understand English as it is spoken, written, and heard in college and
university settings.
• This test emphasizes integrated skills and provides better information
about students’ ability to communicate in an academic setting and their
readiness for academic coursework.
• In 2005, it replaced TOEFL PBT and TOEFL CBT.
29
30. IELTS
• IELTS stands for International English Language Testing System.
• It measures ability to communicate in English across all four language
skills (listening, reading, writing and speaking) for people who intend
to study or work where English is the language of communication.
• Since 1989, it is managed by British Council, IELTS Australia and the
University of Cambridge.
• Settings that use this test: English-medium universities, colleges,
professional organizations, Immigration Canada (proof of English
language ability)
30
31. STEP
• STEP stands for Standardized Test of English Proficiency.
• Based on growing international needs for the English language, several
academic and non-academic institutions have approached the National
Center for Assessment in Higher Education calling for the
development of an English test that could measure the proficiency of
their applicants.
• It is designed to be an objective and unbiased test. It is made up of:
1. Reading Comprehension (RC – 40%),
2. Structure (ST – 30%),
3. Listening Comprehension (LC – 20%), and
4. Compositional Analysis (CA – 10%)
• STEP has 100 questions distributed among these four components.
31
32. Test Techniques for Testing Overall Ability
Multiple Choice
• Multiple Choice
• Advantages
• Scoring is reliable and can be done rapidly and economically,
• Possible to include many more items than would otherwise be
possible in a given period of time – making the test more
reliable.
• Disadvantages
• Tests only recognition knowledge
• Guessing may have a considerable but unknowable effect on
test scores
• Technique severely restricts what can be tested
• It is very difficult to write successful items
• Backwash may be harmful
• Cheating may be facilititated.
32
33. Test Techniques for Testing Overall Ability
Multiple Choice
• Multiple Choice items take many forms, but their basic
structure is as follows:
There is a stem:
Enid has been here ………………… half an hour.
and a number of options, one of which is correct, the
others being distractors:
A. During
B. For
C. While
D. since
33
34. Test Techniques for Testing Overall Ability
Cloze (Fill in the Blanks)
• Cloze
• It involves deleting a number of words in a passage ,
leaving blanks, and requiring the person taking the test to
replace the original words.
• It can an be used with a tape-recorded oral passage to
indirectly test oral ability.
• Clear instructions should be provided and students should
initially be encouraged to read through the passage first.
34
35. Test Techniques for Testing Overall Ability
The C-Test
• A variety of cloze
• Instead of whole words, it is the second half of every
word that is deleted.
• Advantages over the cloze test are
1. Only exact scoring is necessary
2. Shorter (and so more) passages are possible
• In comparison to a Cloze, a C-Test of 100 items takes
little space and not nearly so much time to complete
35
36. Test Techniques for Testing Overall Ability
Dictation
• Dictation tests are:
• in prediction of overall ability have the advantage of
involving listening ability.
• easy to create and administer
• However, they are:
• not easy to score, and
• time-consuming.
• With poorer students, scoring becomes tedious.
• Partial-dictation may be considered as a better alternative
since it is easier for both the test taker and the scorer.
36