Designing language test

Determining
Planning
Writing
Preparing
Reviewing
Pre-testing
Validating
YBC

1. Determining
Be clear with the following:
 The objective of the test (what will it measure?)
 The need for the test (what advantages will it have?)
 The test population (who will take it?)
 The content (what will the test cover?)
 The style of administration (how will it be given)
 The item format (will it be forced choice? Multiple
choice?)
 The inclusion of alternate forms use (is it necessary for
this test?)
 The training requirements (what professionals are
allowed to give the test?)
YBC

2. Planning
 Prepare a table of specifications for the test.
 This will include information on:
 content
 format and timing
 criteria
 levels of performance
 scoring procedures
YBC

3. Writing
A good test item writer should:
 be experienced in test construction.
 know the subject matter well.
 know and understand the students being
tested.
 be thoroughly familiar with test formats
 have the capacity in using language clearly
and economically.
 be ready to sacrifice time and energy.
YBC

4. Preparing
Factors in selecting the appropriate format:
 Purpose of the test
 Time available to prepare and score the test
 The number of students to be tested
 Physical facilities available for reproducing
the test
 Skill in writing the different types of items
YBC

5. Reviewing
Principles for reviewing test items:
 The test should not be reviewed immediately
after its construction, but after some
considerable time.
 Other teachers or testers should review it.
 In a language test, it is preferable if native
speakers are available to review the test.
YBC

6. Pre-testing
 The tester should administer the newly-developed
test to a group of examinees
similar to the target group and the purpose is
to analyse every individual item as well as the
whole test.
 Numerical data (test results) should be
collected to check the efficiency of the item,
it should include item facility and
discrimination.
YBC

7. Validating
 Item difficulty (or easiness)/Item Facility (IF) –
the extent to which an item is easy or difficult
for the proposed group of test-takers
 Item discrimination (ID) –
the extent to which an item differentiates
between high- and low-ability test-takers
YBC

7. Validating
 To measure the facility or easiness of the item, the
following formula is used:
(Σc) - number of correct responses
(N) - total number of candidates
 The results of such equations range from 0 – 1.
 An item with a facility index of 0 is too difficult, and with 1
is too easy.
 The ideal item is one with the value of (0.5) and the
acceptability range for item facility is between [0.37 →
0.63], i.e. less than 0.37 is difficult, and above 0.63 is easy.
 Thus, tests which are too easy or too difficult for a given
sample population, often show low reliability.
YBC

Test specs serve as a blueprint of the test in
the following:
 a description of its content
 item types (methods, such as multiple-choice,
cloze, etc.)
 tasks (e.g. written essay, reading a short
passage, etc.)
 skills to be included
 how the test will be scored
 how it will be reported to students
YBC

According Brown (2005), test specification
should include the following:
1. Outline of the test
2. Skills to be included
3. Item types and tasks
YBC

1. Outline of the test (example)
Section A. Vocabulary
Part 1 (5 items): match words and definitions
Part 2 (5 items): use the words in a sentence
Section B. Grammar
(10 sentences): error detection (underline or circle the error)
Section C. Reading comprehension
(2 one-paragraph passages): four short-answer items for each
Section D. Writing
Respond to a two-paragraph article on Malaysian culture
YBC

2. Skills to be included
 Sometimes due to time constraint, a 60-minute test can only
assess 3 or 4 language skills, e.g. listening, reading, writing and
grammar.
 Other skill such as speaking is done separately in another time as
more time is needed if the teacher is assessing the students one-by-
one.
YBC

3. Item Types and Tasks
 There are a limited number of modes of
eliciting responses (i.e. prompting) and of
responding on tests of any kind.
 Consider: the test prompt can be oral
(student listens) or written (student reads)
and the student can respond orally or in
writing.
YBC

3. Item Types and Tasks (Elicitation mode)
Oral (student listens) Written (student reads)
•word, pair of words
•sentence(s), question
•directions
•monologue, speech
•Pre-recorded conversation;
•interactive (live) dialogue
•word, set of words
•sentence(s), question
•directions
•paragraph
•essay, excerpt
•short story, book

3. Item Types and Tasks (Response mode)
Oral Written
•repeat
•read aloud
•yes / no
•short response
•describe
•role play
•monologue (speech)
•interactive dialogue
•mark multiple-choice option
•fill in the blank
•spell a word
•define a term (with a phrase)
•short answer (2-3 sentences)
•essay

3. Item Types and Tasks (example)
Speaking (5 minute per person, previous day)
Format: oral interview
Task: teacher asks questions of students
Listening (10 minutes)
Format: teacher makes audiotape in advance, with one other
voice on it
Task: a. 5 minimal pair items, MCQ
b. 5 interpretation items, MCQ
Reading (10 minutes)
Format: cloze test items (10 total) in a story line
Task: fill-in the blanks
Writing (10 minutes)
Format: prompt for a topic: why I like/ do not like football
Task: writing a short opinion paragraph

 Blooms’ Taxonomy (1956) is a systematic way
of describing how a learner’s performance
develops from simple to complex levels in
their affective, psychomotor and cognitive
domain of learning.
 The original taxonomy provided carefully
developed definitions for each of the six
major categories in the cognitive domain and
it was revised in 2001.
YBC

 SOLO (Biggs & Collis, 1982), which stands for
the Structure of the Observed Learning
Outcome, taxonomy is a systematic way of
describing how a learner’s performance
develops from simple to complex levels in their
learning.
 There are 5 stages, namely Prestructural,
Unistructural, Multistructural, which are in a
quantitative phrase and Relational and Extended
Abstract, which are in a qualitative phrase.
 Students find learning more complex as it
advances.
YBC

 SOLO is a means of classifying learning
outcomes in terms of their complexity, enabling
teachers to assess students’ work in terms of its
quality not of how many bits of this and of that
they got right.
 At first we pick up only one or few aspects of the
task (unistructural), then several aspects but
they are unrelated (multistructural), then we
learn how to integrate them into a whole
(relational), and finally, we are able to generalise
that whole to as yet untaught applications
(extended abstract).
YBC

 The SOLO taxonomy maps the complexity of a student’s
work by linking it to one of five phases: little or no
understanding (Prestructural), through a simple and then
more developed grasp of the topic (Unistructural and
Multistructural), to the ability to link the ideas and
elements of a task together (Relational) and finally
(Extended Abstract) to understand the topic for
themselves, possibly going beyond the initial scope of the
task (Biggs & Collis, 1982; Hattie & Brown, 2004).
 In their later research into multimodal learning, Biggs &
Collis noted that there was an ‘increase in the structural
complexity of their (the students’) responses’ (1991:64).
YBC

 Aim of the test: measure the objectives
prescribed by the blueprint and meet quality
standards.
 Range of topics to be tested: measure the
test-takers’ ability or proficiency in applying
the knowledge and principles on the topics
that they have learnt.
 Range of skills to be tested: measure higher
levels of cognitive processing.
YBC

 Test format: follow a consistent design so that
the questioning process in itself does not give
unnecessary difficulty to answering questions.
 Level of difficulty: plan number of questions at a
level of difficulty and discrimination to best
determine mastery and non-mastery
performance states.
 Internal and cultural considerations (biasness):
refrain from the use of slang, geographic
references, historical references or dates
(holidays) that may not be understood by an
international examinee.
YBC

Designing language test

More Related Content

What's hot

Viewers also liked

Similar to Designing language test

More from Jesullyna Manuel

Recently uploaded

Designing language test