2. 1. Planning and designing the test
2. Defining reading
3. Writing specifications for individual
task
4. Writing the test
5. Writing good MCQs
3. Test context and purpose specifications
Specifications of the overall test structure
Specifications for individual test tasks
4. Test context and purpose specifications
⇒How the test will be used?
⇒How results will be interpreted?
⇒Why the test is being given in the first place?
5. Purpose(s) of the test What decision(s) will you make based on test results?
Construct(s) to be assessed and
their definitions
• What are the constructs, and what do you view
them as including?
Target language use (TLU)
domain and common
/important task types
• What language use context(s) outside the test
should test results generalize to?
• What should the test tell you about what students
can do in those contexts?
Characteristics of test takers • Who are the people who will take your test, what
are they like, and what do they probably know?
Minimum acceptable levels for
each of the qualities of
usefulness
• What is the lowest level you would be willing to
accept for each one, and how do you prioritize
them?
Resource plan • What resources do you need to create and use this
test, and what resources are available? (time, space,
materials and equipment, personnel)
7. Construct Elements often included in construct definitions
Listening Listening for the main idea, listening for major points, listening
for specific details, listening for the gist, inferencing, predicting,
determining the meaning of unfamiliar vocabulary from context,
distinguishing fact from opinion, determining speaker’s intent,
notetaking
Reading Reading for the main idea, reading for major points, reading for
specific details, reading for the gist, inferencing, predicting, skimming,
scanning, determining the meaning of unfamiliar vocabulary from
context, distinguishing fact from opinion, sensitivity to rhetorical
organization of the text, sensitivity to cohesion of the text, identifying
author purpose or tone, paraphrasing texts
8. ⇒What will students do when they take the test? (task
format)
⇒How many questions or prompts will there be in
each section of the test? (number of tasks)
9. ⇒How the test will be used?
⇒How results will be interpreted?
⇒Why the test is being given in the first place?
10. Steps in the process
1.Review construct definitions.
2.Choose task format(s) for assessing each construct,
or for each aspect of each construct.
3.Decide how many of each task (questions,
passages, prompts, etc.) to use in each section.
11. In group of 3-4, take a test that you are familiar with
and analyze their purpose, context and overall
structure. It can be an institutional test that your
faculty is running or it can be an international one
such as PET, IELTS or TOEFL.
12. What do we test when we test reading
competence?
14. Focuses on the divisibility of capabilities a reader is
assumed to need in tackling certain test item
Defing factors that contribute the succesful reading
(recalling word meanings, drawing inferences,
recognising a writer’s purpose/attitude/tone, finding
answers to explicit questions and following the
structure of a passage) (Davis, 1968)
Weaknesses: focuses on the outcome/performance
in the test rather then the reading process itself
15. Reading might be subdvided into
competencies which the skilled reader is
believed to have.
Mainly based on informed intuition
rather than emperical research
16. Level of activity
(from more simple
to more complex)
Readers’ typical cognitive operations
in language tests
Size of typical
unit
1 Lexis: word matching Reader identifies same word in question and text Word
2 Lexis: synonym and
word-class matching
Reader uses knowledge of word meaning or
word class to identify synonym, antonym or
other related word
Word
3 Grammar/syntax Reader uses grammatical knowledge to
disambiguate and identify answer
Clause/
sentence
4 Propositional
meaning
Reader uses knowledge of lexis and grammar to
establish meaning of a sentence.
Sentence
5 Inference Reader goes beyond literal meaning to infer a
further significance
Sentence/
paragraph/ text
6 Building a mental
model
Reader uses several features of the text to build
a larger mental model
Text
7 Understanding text
function
Reader uses genre knowledge to identify text
structure and purpose
Text
17. In group of 3-4, take a test that you are
familiar with and analyze the underlying
construct of the reading component. It can
be an institutional test that your faculty is
running or it can be an international one
such as PET, IELTS or TOEFL.
18. Specifications
component
Comments
Purpose, and
construct
Taken from the overall test specifications, which were
based on the context and purpose specifications
Description of the
task
Descriptions of questions, prompts, or reading or
listening passages (depending on the task format being
used), and time allowed to complete the section.
Scoring method Including:
• For questions: right/wrong, or partial credit?
• For limited production tasks: model answer
descriptions, and essential response components
• For extended production tasks: considerations the
scoring rubric must address, and type of scoring rubric
Sample task Example of each type of prompt, question, and reading
or listening passage
19. Studying the spec.
Selecting text(s)
Writing Items
Piloting test
Analyzing results & revising items
Piloting with a larger population
Test calibration & validation
20.
21. Passage must be able to support questions of
the sort called for in the specs.
Length
Topic
Topical specificity
Vocab (variety, frequency/familiarity, density)
Syntax
Genre
Rhetorical mode
25. • Whether items are presented in the LI or L2
• Whether “false” items need to be corrected
• How long the item stems and options are
• Which vocabulary and grammatical
structures are to be used (or avoided), if any
26. • Whether items are presented in the LI or L2
• How many distractors there are
• How long the item stems and options should be
• Which vocabulary and grammatical structures are
to be used (or avoided), if any
• Which part(s) of speech are to be used (for one word
answers)
27. • Whether items are presented in the LI or L2
• Whether responses are accepted in the L 1, L2, or
both
• Whether linguistic accuracy is included in scoring
criteria (for answers in the L2)
• Which vocabulary and grammatical structures are to
be used (or avoided), if any
28. • How long item stems are
• How long responses should be
• What is the maximum number of pieces of
information to ask for
• Whether polytomous or dichotomous scoring
is to be used; if polytomous, how many points
per item, or per piece of information
29. • Whether items are presented in the LI or L2
• How long the item stem and options are
• Which vocabulary and grammatical
structures areto be used (or avoided), if any
• How many options there are
30. Parts of an MCQ
stem - the text of the question
options - the choices provided after the stem
(these include the key and the distractors)
the key - the correct answer in the list of
options
distractors - the incorrect answers in the list of
options
32. 1. The message is about a change in schedule for
a(n)______ .
A. business meeting
B. doctors appointment
C. airline flight
2. The meeting time has been changed from______ .
A. 8 am to 9 am
B. 9 am to 10 am
C. 10 am to 11 am
33. According to the passage, there are now over______ dance schools
in Italy.
A two hundred
B two thousand
C two hundred thousand
D two million
The research center uses the panda blood samples for:
A creating super-pandas
B research and storage
C comparison with bears and cats
D display purposes
34. The options are not all grammatically
consistent with the stem.
The doctor gave the patient some_______.
A medicine
B stethoscope
C surgical
35. The reading passage is about the_______airplane in the world.
A biggest
B first
C oldest
D fastest
The engineers in the passage are trying to make a ______ airplane.
A huge
B fast
C small
D fuel-efficient
36. Orange County:
A. has the largest concentration of
Vietnamese people in California.
B. is one of the locations with the most
Vietnamese residents in California.
C. has the largest concentration of
Vietnamese people in the U.S.
38. The stem should not contain irrelevant
material
A. original stem:
Paul Muldoon, an Irish postmodern poet who uses
experimental and playful language, uses which
poetic genre in "Why Brownlee Left"?
a. sonnet
b. elegy
c. narrative poem
d. dramatic monologue
e. haiku
B. improved stem
Paul Muldoon uses which poetic genre in "Why
Brownlee Left"?
a. sonnet
b. elegy
c. narrative poem
d. dramatic monologue
e. haiku
39. The stem should not contain irrelevant
material
A. original stem:
Paul Muldoon, an Irish postmodern poet who uses
experimental and playful language, uses which
poetic genre in "Why Brownlee Left"?
a. sonnet
b. elegy
c. narrative poem
d. dramatic monologue
e. haiku
B. improved stem
Paul Muldoon uses which poetic genre in "Why
Brownlee Left"?
a. sonnet
b. elegy
c. narrative poem
d. dramatic monologue
e. haiku
40. Duplicating material in each of the
options (instead of putting it in the
stem)
A. original stem
Theorists of pluralism have asserted which of the
following?
a. The maintenance of democracy requires a
large middle class.
b. The maintenance of democracy requires
autonomous centres of contervailing
power.
c. The maintenance of democracy requires
the existence of a multiplicity of religious
groups.
d. The maintenance of democracy requires a
predominantly urban population.
e. The maintenance of democracy requires
the separation of governmental powers.
B. improved stem
Theorists of pluralism have asserted that the
maintenance of democracy requires
a. a large middle class
b. autonomous centres of contervailing power
c. existence of a multiplicity of religious
groups
d. a predominantly urban population
e. separation of governmental powers
41. Duplicating material in each of the
options (instead of putting it in the
stem)
A. original stem
Theorists of pluralism have asserted which of the
following?
a. The maintenance of democracy requires a
large middle class.
b. The maintenance of democracy requires
autonomous centres of contervailing
power.
c. The maintenance of democracy requires
the existence of a multiplicity of religious
groups.
d. The maintenance of democracy requires a
predominantly urban population.
e. The maintenance of democracy requires
the separation of governmental powers.
B. improved stem
Theorists of pluralism have asserted that the
maintenance of democracy requires
a. a large middle class
b. autonomous centres of contervailing power
c. existence of a multiplicity of religious
groups
d. a predominantly urban population
e. separation of governmental powers
46. What is the task format? (MCQs,
True/False, gap-fill, cloze,..)
How many items? (mininum 5)
What skill does each item test?
What is the level of difficulty of each
item? (A1-C1)
Time: 3.00-3.25pm
Editor's Notes
Usefulness (Bachman & Palmer, 1996): reliability, authenticity, construct validity, impact and practicality.
Validity is an integrated evaluative judgement of the degree to which empirical evidence and theoretical rationales support the adequacy and appropriateness of inferences and actions based on test scores or other modes of assessment.
Messick đưa ra sáu khía cạnh của tính giá trị như sau:
Khía cạnh nội dung (The content aspect) của tính giá trị bao gồm các minh chứng về sự liên quan và sự đại diện về mặt nội dung của các câu hỏi kiểm tra và chất lượng kỹ thuật của bài kiểm tra;
Khía cạnh lý thuyết (The substantive aspect) của tính giá trị được dùng để chỉ các lập luận mang tính lý thuyết để giải thích cho tính nhất quán trong các câu trả lời của thí sinh mà người kiểm tra thu được;
Khía cạnh cấu trúc (The structural aspect) đánh giá độ chính xác của cấu trúc điểm so với cấu trúc của lĩnh vực năng lực mà bài kiểm tra đang đo lường;
Khía cạnh khái quát hóa (The generalisability aspect) kiểm tra khả năng khái quát từ các đặc điểm và các ý nghĩa của điểm số trong một bài kiểm tra trên một mẫu thí sinh này tới các mẫu thí sinh khác, vào các bối cảnh kiểm tra khác, và với các câu hỏi kiểm tra khác, bao gồm cả việc khái quát về năng lực của các mối tương quan giữa điểm bài kiểm tra và điểm tiêu chí. Khía cạnh này bao trùm khái niệm độ tin cậy của điểm bài kiểm tra;
Khía cạnh bên ngoài (The external aspect) gồm các minh chứng về độ hội tụ và độ phân biệt của các phép so sánh nhiều năng lực – nhiều phương pháp cũng như các minh chứng về sự liên quan của tiêu chí và sự hữu ích của việc sử dụng điểm số; và
Khía cạnh hệ quả (The consequential aspect) đánh giá các ngụ ý về giá trị của điểm số, các hệ quả thực tế hay hậu quả tiềm tàng của việc sử dụng bài kiểm tra, đặc biệt trong các vấn đề về tính công bằng trong kiểm tra đánh giá. (Messick 1995).