Determining 
Planning 
Writing 
Preparing 
Reviewing 
Pre-testing 
Validating 
YBC
1. Determining 
Be clear with the following: 
 The objective of the test (what will it measure?) 
 The need for the test (what advantages will it have?) 
 The test population (who will take it?) 
 The content (what will the test cover?) 
 The style of administration (how will it be given) 
 The item format (will it be forced choice? Multiple 
choice?) 
 The inclusion of alternate forms use (is it necessary for 
this test?) 
 The training requirements (what professionals are 
allowed to give the test?) 
YBC
2. Planning 
 Prepare a table of specifications for the test. 
 This will include information on: 
 content 
 format and timing 
 criteria 
 levels of performance 
 scoring procedures 
YBC
3. Writing 
A good test item writer should: 
 be experienced in test construction. 
 know the subject matter well. 
 know and understand the students being 
tested. 
 be thoroughly familiar with test formats 
 have the capacity in using language clearly 
and economically. 
 be ready to sacrifice time and energy. 
YBC
4. Preparing 
Factors in selecting the appropriate format: 
 Purpose of the test 
 Time available to prepare and score the test 
 The number of students to be tested 
 Physical facilities available for reproducing 
the test 
 Skill in writing the different types of items 
YBC
5. Reviewing 
Principles for reviewing test items: 
 The test should not be reviewed immediately 
after its construction, but after some 
considerable time. 
 Other teachers or testers should review it. 
 In a language test, it is preferable if native 
speakers are available to review the test. 
YBC
6. Pre-testing 
 The tester should administer the newly-developed 
test to a group of examinees 
similar to the target group and the purpose is 
to analyse every individual item as well as the 
whole test. 
 Numerical data (test results) should be 
collected to check the efficiency of the item, 
it should include item facility and 
discrimination. 
YBC
7. Validating 
 Item difficulty (or easiness)/Item Facility (IF) – 
the extent to which an item is easy or difficult 
for the proposed group of test-takers 
 Item discrimination (ID) – 
the extent to which an item differentiates 
between high- and low-ability test-takers 
YBC
7. Validating 
 To measure the facility or easiness of the item, the 
following formula is used: 
(Σc) - number of correct responses 
(N) - total number of candidates 
 The results of such equations range from 0 – 1. 
 An item with a facility index of 0 is too difficult, and with 1 
is too easy. 
 The ideal item is one with the value of (0.5) and the 
acceptability range for item facility is between [0.37 → 
0.63], i.e. less than 0.37 is difficult, and above 0.63 is easy. 
 Thus, tests which are too easy or too difficult for a given 
sample population, often show low reliability. 
YBC
Test specs serve as a blueprint of the test in 
the following: 
 a description of its content 
 item types (methods, such as multiple-choice, 
cloze, etc.) 
 tasks (e.g. written essay, reading a short 
passage, etc.) 
 skills to be included 
 how the test will be scored 
 how it will be reported to students 
YBC
According Brown (2005), test specification 
should include the following: 
1. Outline of the test 
2. Skills to be included 
3. Item types and tasks 
YBC
1. Outline of the test (example) 
Section A. Vocabulary 
Part 1 (5 items): match words and definitions 
Part 2 (5 items): use the words in a sentence 
Section B. Grammar 
(10 sentences): error detection (underline or circle the error) 
Section C. Reading comprehension 
(2 one-paragraph passages): four short-answer items for each 
Section D. Writing 
Respond to a two-paragraph article on Malaysian culture 
YBC
2. Skills to be included 
 Sometimes due to time constraint, a 60-minute test can only 
assess 3 or 4 language skills, e.g. listening, reading, writing and 
grammar. 
 Other skill such as speaking is done separately in another time as 
more time is needed if the teacher is assessing the students one-by- 
one. 
YBC
3. Item Types and Tasks 
 There are a limited number of modes of 
eliciting responses (i.e. prompting) and of 
responding on tests of any kind. 
 Consider: the test prompt can be oral 
(student listens) or written (student reads) 
and the student can respond orally or in 
writing. 
YBC
3. Item Types and Tasks (Elicitation mode) 
Oral (student listens) Written (student reads) 
•word, pair of words 
•sentence(s), question 
•directions 
•monologue, speech 
•Pre-recorded conversation; 
•interactive (live) dialogue 
•word, set of words 
•sentence(s), question 
•directions 
•paragraph 
•essay, excerpt 
•short story, book
3. Item Types and Tasks (Response mode) 
Oral Written 
•repeat 
•read aloud 
•yes / no 
•short response 
•describe 
•role play 
•monologue (speech) 
•interactive dialogue 
•mark multiple-choice option 
•fill in the blank 
•spell a word 
•define a term (with a phrase) 
•short answer (2-3 sentences) 
•essay
3. Item Types and Tasks (example) 
Speaking (5 minute per person, previous day) 
Format: oral interview 
Task: teacher asks questions of students 
Listening (10 minutes) 
Format: teacher makes audiotape in advance, with one other 
voice on it 
Task: a. 5 minimal pair items, MCQ 
b. 5 interpretation items, MCQ 
Reading (10 minutes) 
Format: cloze test items (10 total) in a story line 
Task: fill-in the blanks 
Writing (10 minutes) 
Format: prompt for a topic: why I like/ do not like football 
Task: writing a short opinion paragraph
 Blooms’ Taxonomy (1956) is a systematic way 
of describing how a learner’s performance 
develops from simple to complex levels in 
their affective, psychomotor and cognitive 
domain of learning. 
 The original taxonomy provided carefully 
developed definitions for each of the six 
major categories in the cognitive domain and 
it was revised in 2001. 
YBC
YBC 
Anderson and Krathwohl
YBC
 SOLO (Biggs & Collis, 1982), which stands for 
the Structure of the Observed Learning 
Outcome, taxonomy is a systematic way of 
describing how a learner’s performance 
develops from simple to complex levels in their 
learning. 
 There are 5 stages, namely Prestructural, 
Unistructural, Multistructural, which are in a 
quantitative phrase and Relational and Extended 
Abstract, which are in a qualitative phrase. 
 Students find learning more complex as it 
advances. 
YBC
 SOLO is a means of classifying learning 
outcomes in terms of their complexity, enabling 
teachers to assess students’ work in terms of its 
quality not of how many bits of this and of that 
they got right. 
 At first we pick up only one or few aspects of the 
task (unistructural), then several aspects but 
they are unrelated (multistructural), then we 
learn how to integrate them into a whole 
(relational), and finally, we are able to generalise 
that whole to as yet untaught applications 
(extended abstract). 
YBC
YBC
YBC
 The SOLO taxonomy maps the complexity of a student’s 
work by linking it to one of five phases: little or no 
understanding (Prestructural), through a simple and then 
more developed grasp of the topic (Unistructural and 
Multistructural), to the ability to link the ideas and 
elements of a task together (Relational) and finally 
(Extended Abstract) to understand the topic for 
themselves, possibly going beyond the initial scope of the 
task (Biggs & Collis, 1982; Hattie & Brown, 2004). 
 In their later research into multimodal learning, Biggs & 
Collis noted that there was an ‘increase in the structural 
complexity of their (the students’) responses’ (1991:64). 
YBC
 Aim of the test: measure the objectives 
prescribed by the blueprint and meet quality 
standards. 
 Range of topics to be tested: measure the 
test-takers’ ability or proficiency in applying 
the knowledge and principles on the topics 
that they have learnt. 
 Range of skills to be tested: measure higher 
levels of cognitive processing. 
YBC
 Test format: follow a consistent design so that 
the questioning process in itself does not give 
unnecessary difficulty to answering questions. 
 Level of difficulty: plan number of questions at a 
level of difficulty and discrimination to best 
determine mastery and non-mastery 
performance states. 
 Internal and cultural considerations (biasness): 
refrain from the use of slang, geographic 
references, historical references or dates 
(holidays) that may not be understood by an 
international examinee. 
YBC

Designing language test

  • 2.
    Determining Planning Writing Preparing Reviewing Pre-testing Validating YBC
  • 3.
    1. Determining Beclear with the following:  The objective of the test (what will it measure?)  The need for the test (what advantages will it have?)  The test population (who will take it?)  The content (what will the test cover?)  The style of administration (how will it be given)  The item format (will it be forced choice? Multiple choice?)  The inclusion of alternate forms use (is it necessary for this test?)  The training requirements (what professionals are allowed to give the test?) YBC
  • 4.
    2. Planning Prepare a table of specifications for the test.  This will include information on:  content  format and timing  criteria  levels of performance  scoring procedures YBC
  • 5.
    3. Writing Agood test item writer should:  be experienced in test construction.  know the subject matter well.  know and understand the students being tested.  be thoroughly familiar with test formats  have the capacity in using language clearly and economically.  be ready to sacrifice time and energy. YBC
  • 6.
    4. Preparing Factorsin selecting the appropriate format:  Purpose of the test  Time available to prepare and score the test  The number of students to be tested  Physical facilities available for reproducing the test  Skill in writing the different types of items YBC
  • 7.
    5. Reviewing Principlesfor reviewing test items:  The test should not be reviewed immediately after its construction, but after some considerable time.  Other teachers or testers should review it.  In a language test, it is preferable if native speakers are available to review the test. YBC
  • 8.
    6. Pre-testing The tester should administer the newly-developed test to a group of examinees similar to the target group and the purpose is to analyse every individual item as well as the whole test.  Numerical data (test results) should be collected to check the efficiency of the item, it should include item facility and discrimination. YBC
  • 9.
    7. Validating Item difficulty (or easiness)/Item Facility (IF) – the extent to which an item is easy or difficult for the proposed group of test-takers  Item discrimination (ID) – the extent to which an item differentiates between high- and low-ability test-takers YBC
  • 10.
    7. Validating To measure the facility or easiness of the item, the following formula is used: (Σc) - number of correct responses (N) - total number of candidates  The results of such equations range from 0 – 1.  An item with a facility index of 0 is too difficult, and with 1 is too easy.  The ideal item is one with the value of (0.5) and the acceptability range for item facility is between [0.37 → 0.63], i.e. less than 0.37 is difficult, and above 0.63 is easy.  Thus, tests which are too easy or too difficult for a given sample population, often show low reliability. YBC
  • 11.
    Test specs serveas a blueprint of the test in the following:  a description of its content  item types (methods, such as multiple-choice, cloze, etc.)  tasks (e.g. written essay, reading a short passage, etc.)  skills to be included  how the test will be scored  how it will be reported to students YBC
  • 12.
    According Brown (2005),test specification should include the following: 1. Outline of the test 2. Skills to be included 3. Item types and tasks YBC
  • 13.
    1. Outline ofthe test (example) Section A. Vocabulary Part 1 (5 items): match words and definitions Part 2 (5 items): use the words in a sentence Section B. Grammar (10 sentences): error detection (underline or circle the error) Section C. Reading comprehension (2 one-paragraph passages): four short-answer items for each Section D. Writing Respond to a two-paragraph article on Malaysian culture YBC
  • 14.
    2. Skills tobe included  Sometimes due to time constraint, a 60-minute test can only assess 3 or 4 language skills, e.g. listening, reading, writing and grammar.  Other skill such as speaking is done separately in another time as more time is needed if the teacher is assessing the students one-by- one. YBC
  • 15.
    3. Item Typesand Tasks  There are a limited number of modes of eliciting responses (i.e. prompting) and of responding on tests of any kind.  Consider: the test prompt can be oral (student listens) or written (student reads) and the student can respond orally or in writing. YBC
  • 16.
    3. Item Typesand Tasks (Elicitation mode) Oral (student listens) Written (student reads) •word, pair of words •sentence(s), question •directions •monologue, speech •Pre-recorded conversation; •interactive (live) dialogue •word, set of words •sentence(s), question •directions •paragraph •essay, excerpt •short story, book
  • 17.
    3. Item Typesand Tasks (Response mode) Oral Written •repeat •read aloud •yes / no •short response •describe •role play •monologue (speech) •interactive dialogue •mark multiple-choice option •fill in the blank •spell a word •define a term (with a phrase) •short answer (2-3 sentences) •essay
  • 18.
    3. Item Typesand Tasks (example) Speaking (5 minute per person, previous day) Format: oral interview Task: teacher asks questions of students Listening (10 minutes) Format: teacher makes audiotape in advance, with one other voice on it Task: a. 5 minimal pair items, MCQ b. 5 interpretation items, MCQ Reading (10 minutes) Format: cloze test items (10 total) in a story line Task: fill-in the blanks Writing (10 minutes) Format: prompt for a topic: why I like/ do not like football Task: writing a short opinion paragraph
  • 19.
     Blooms’ Taxonomy(1956) is a systematic way of describing how a learner’s performance develops from simple to complex levels in their affective, psychomotor and cognitive domain of learning.  The original taxonomy provided carefully developed definitions for each of the six major categories in the cognitive domain and it was revised in 2001. YBC
  • 20.
  • 21.
  • 22.
     SOLO (Biggs& Collis, 1982), which stands for the Structure of the Observed Learning Outcome, taxonomy is a systematic way of describing how a learner’s performance develops from simple to complex levels in their learning.  There are 5 stages, namely Prestructural, Unistructural, Multistructural, which are in a quantitative phrase and Relational and Extended Abstract, which are in a qualitative phrase.  Students find learning more complex as it advances. YBC
  • 23.
     SOLO isa means of classifying learning outcomes in terms of their complexity, enabling teachers to assess students’ work in terms of its quality not of how many bits of this and of that they got right.  At first we pick up only one or few aspects of the task (unistructural), then several aspects but they are unrelated (multistructural), then we learn how to integrate them into a whole (relational), and finally, we are able to generalise that whole to as yet untaught applications (extended abstract). YBC
  • 24.
  • 25.
  • 26.
     The SOLOtaxonomy maps the complexity of a student’s work by linking it to one of five phases: little or no understanding (Prestructural), through a simple and then more developed grasp of the topic (Unistructural and Multistructural), to the ability to link the ideas and elements of a task together (Relational) and finally (Extended Abstract) to understand the topic for themselves, possibly going beyond the initial scope of the task (Biggs & Collis, 1982; Hattie & Brown, 2004).  In their later research into multimodal learning, Biggs & Collis noted that there was an ‘increase in the structural complexity of their (the students’) responses’ (1991:64). YBC
  • 27.
     Aim ofthe test: measure the objectives prescribed by the blueprint and meet quality standards.  Range of topics to be tested: measure the test-takers’ ability or proficiency in applying the knowledge and principles on the topics that they have learnt.  Range of skills to be tested: measure higher levels of cognitive processing. YBC
  • 28.
     Test format:follow a consistent design so that the questioning process in itself does not give unnecessary difficulty to answering questions.  Level of difficulty: plan number of questions at a level of difficulty and discrimination to best determine mastery and non-mastery performance states.  Internal and cultural considerations (biasness): refrain from the use of slang, geographic references, historical references or dates (holidays) that may not be understood by an international examinee. YBC