EXAMINING READING
NGUYEN HUYEN MINH, M.A.
30-31 DECEMBER 2015
1. Planning and designing the test
2. Defining reading
3. Writing specifications for individual
task
4. Writing the test
5. Writing good MCQs
Test context and purpose specifications
Specifications of the overall test structure
Specifications for individual test tasks
Test context and purpose specifications
⇒How the test will be used?
⇒How results will be interpreted?
⇒Why the test is being given in the first place?
Purpose(s) of the test What decision(s) will you make based on test results?
Construct(s) to be assessed and
their definitions
• What are the constructs, and what do you view
them as including?
Target language use (TLU)
domain and common
/important task types
• What language use context(s) outside the test
should test results generalize to?
• What should the test tell you about what students
can do in those contexts?
Characteristics of test takers • Who are the people who will take your test, what
are they like, and what do they probably know?
Minimum acceptable levels for
each of the qualities of
usefulness
• What is the lowest level you would be willing to
accept for each one, and how do you prioritize
them?
Resource plan • What resources do you need to create and use this
test, and what resources are available? (time, space,
materials and equipment, personnel)
Usefulness (Bachman & Palmer, 1996):
reliability, authenticity, construct validity,
impact and practicality.
Construct Elements often included in construct definitions
Listening Listening for the main idea, listening for major points, listening
for specific details, listening for the gist, inferencing, predicting,
determining the meaning of unfamiliar vocabulary from context,
distinguishing fact from opinion, determining speaker’s intent,
notetaking
Reading Reading for the main idea, reading for major points, reading for
specific details, reading for the gist, inferencing, predicting, skimming,
scanning, determining the meaning of unfamiliar vocabulary from
context, distinguishing fact from opinion, sensitivity to rhetorical
organization of the text, sensitivity to cohesion of the text, identifying
author purpose or tone, paraphrasing texts
⇒What will students do when they take the test? (task
format)
⇒How many questions or prompts will there be in
each section of the test? (number of tasks)
⇒How the test will be used?
⇒How results will be interpreted?
⇒Why the test is being given in the first place?
Steps in the process
1.Review construct definitions.
2.Choose task format(s) for assessing each construct,
or for each aspect of each construct.
3.Decide how many of each task (questions,
passages, prompts, etc.) to use in each section.
In group of 3-4, take a test that you are familiar with
and analyze their purpose, context and overall
structure. It can be an institutional test that your
faculty is running or it can be an international one
such as PET, IELTS or TOEFL.
What do we test when we test reading
competence?
A factorial approach
A subskill approach
A cognitive process approach
Focuses on the divisibility of capabilities a reader is
assumed to need in tackling certain test item
Defing factors that contribute the succesful reading
(recalling word meanings, drawing inferences,
recognising a writer’s purpose/attitude/tone, finding
answers to explicit questions and following the
structure of a passage) (Davis, 1968)
Weaknesses: focuses on the outcome/performance
in the test rather then the reading process itself
Reading might be subdvided into
competencies which the skilled reader is
believed to have.
Mainly based on informed intuition
rather than emperical research
  Level of activity
(from more simple
to more complex)
Readers’ typical cognitive operations
in language tests
Size  of  typical 
unit
1 Lexis: word matching Reader identifies same word in question and text Word
2 Lexis: synonym and
word-class matching
Reader uses knowledge of word meaning or
word class to identify synonym, antonym or
other related word
Word
3 Grammar/syntax Reader uses grammatical knowledge to
disambiguate and identify answer
Clause/
sentence
4 Propositional
meaning
Reader uses knowledge of lexis and grammar to
establish meaning of a sentence.
Sentence
5 Inference Reader goes beyond literal meaning to infer a
further significance
Sentence/
paragraph/ text
6 Building a mental
model
Reader uses several features of the text to build
a larger mental model
Text
7 Understanding text
function
Reader uses genre knowledge to identify text
structure and purpose
Text
In group of 3-4, take a test that you are
familiar with and analyze the underlying
construct of the reading component. It can
be an institutional test that your faculty is
running or it can be an international one
such as PET, IELTS or TOEFL.
Specifications 
component
Comments
Purpose, and 
construct
Taken from the overall test specifications, which were
based on the context and purpose specifications
Description of the 
task
Descriptions of questions, prompts, or reading or
listening passages (depending on the task format being
used), and time allowed to complete the section.
Scoring method Including:
• For questions: right/wrong, or partial credit?
• For limited production tasks: model answer
descriptions, and essential response components
• For extended production tasks: considerations the
scoring rubric must address, and type of scoring rubric
Sample task Example of each type of prompt, question, and reading
or listening passage
Studying the spec.
Selecting text(s)
Writing Items
Piloting test
Analyzing results & revising items
Piloting with a larger population
Test calibration & validation
Passage must be able to support questions of
the sort called for in the specs.
Length
Topic
Topical specificity
Vocab (variety, frequency/familiarity, density)
Syntax
Genre
Rhetorical mode
Coh-metrix www.cohmetrix.com
Coh-Metrix Text Easability Assessor
http://tea.comehtrix.com
http://www.readabilityformulas.com
https://readability-score.com/
Readibility: Means (Standard
Deviations) for readability formulas and
text levels
Variables Beginner Intermediate Advanced
Flesch-Kincaid Grade Level 8.472 (1.613) 9.656 (1.703) 10.207 (1.612)
Flesch Reading Ease Score 63.978 (8.354) 58.806 (8.601) 55.506 (9.203)
Coh-Metrix L2 Reading Index 19.951 (4.151) 16.076 (5.312) 12.897 (5.198)
Seleted-based response: True/False;
Matching; MCQs; gap-fill, cloze
Constructed-based response: short
answer; free writing
• Whether items are presented in the LI or L2
• Whether “false” items need to be corrected
• How long the item stems and options are
• Which vocabulary and grammatical
structures are to be used (or avoided), if any
• Whether items are presented in the LI or L2
• How many distractors there are
• How long the item stems and options should be
• Which vocabulary and grammatical structures are
to be used (or avoided), if any
• Which part(s) of speech are to be used (for one word
answers)
• Whether items are presented in the LI or L2
• Whether responses are accepted in the L 1, L2, or
both
• Whether linguistic accuracy is included in scoring
criteria (for answers in the L2)
• Which vocabulary and grammatical structures are to
be used (or avoided), if any
• How long item stems are
• How long responses should be
• What is the maximum number of pieces of
information to ask for
• Whether polytomous or dichotomous scoring
is to be used; if polytomous, how many points
per item, or per piece of information
• Whether items are presented in the LI or L2
• How long the item stem and options are
• Which vocabulary and grammatical
structures areto be used (or avoided), if any
• How many options there are
Parts of an MCQ
stem - the text of the question
options - the choices provided after the stem
(these include the key and the distractors)
the key - the correct answer in the list of
options
distractors - the incorrect answers in the list of
options
What’s wrong here?
1. The message is about a change in schedule for
a(n)______ .
A. business meeting
B. doctors appointment
C. airline flight
2. The meeting time has been changed from______ .
A. 8 am to 9 am
B. 9 am to 10 am
C. 10 am to 11 am
According to the passage, there are now over______ dance schools
in Italy.
A two hundred
B two thousand
C two hundred thousand
D two million
The research center uses the panda blood samples for:
A creating super-pandas
B research and storage
C comparison with bears and cats
D display purposes
The options are not all grammatically
consistent with the stem.
The doctor gave the patient some_______.
A medicine
B stethoscope
C surgical
The reading passage is about the_______airplane in the world.
A biggest
B first
C oldest
D fastest
The engineers in the passage are trying to make a ______ airplane.
A huge
B fast
C small
D fuel-efficient
Orange County:
A. has the largest concentration of
Vietnamese people in California.
B. is one of the locations with the most
Vietnamese residents in California.
C. has the largest concentration of
Vietnamese people in the U.S.
The stem is not meaningful by itself
The stem should not contain irrelevant
material
A. original stem:
Paul Muldoon, an Irish postmodern poet who uses
experimental and playful language, uses which
poetic genre in "Why Brownlee Left"?
a. sonnet
b. elegy
c. narrative poem
d. dramatic monologue
e. haiku
B. improved stem
Paul Muldoon uses which poetic genre in "Why
Brownlee Left"?
a. sonnet
b. elegy
c. narrative poem
d. dramatic monologue
e. haiku
The stem should not contain irrelevant
material
A. original stem:
Paul Muldoon, an Irish postmodern poet who uses
experimental and playful language, uses which
poetic genre in "Why Brownlee Left"?
a. sonnet
b. elegy
c. narrative poem
d. dramatic monologue
e. haiku
B. improved stem
Paul Muldoon uses which poetic genre in "Why
Brownlee Left"?
a. sonnet
b. elegy
c. narrative poem
d. dramatic monologue
e. haiku
Duplicating material in each of the
options (instead of putting it in the
stem)
A. original stem
Theorists of pluralism have asserted which of the
following?
a. The maintenance of democracy requires a
large middle class.
b. The maintenance of democracy requires
autonomous centres of contervailing
power.
c. The maintenance of democracy requires
the existence of a multiplicity of religious
groups.
d. The maintenance of democracy requires a
predominantly urban population.
e. The maintenance of democracy requires
the separation of governmental powers.
B. improved stem
Theorists of pluralism have asserted that the
maintenance of democracy requires
a. a large middle class
b. autonomous centres of contervailing power
c. existence of a multiplicity of religious
groups
d. a predominantly urban population
e. separation of governmental powers
Duplicating material in each of the
options (instead of putting it in the
stem)
A. original stem
Theorists of pluralism have asserted which of the
following?
a. The maintenance of democracy requires a
large middle class.
b. The maintenance of democracy requires
autonomous centres of contervailing
power.
c. The maintenance of democracy requires
the existence of a multiplicity of religious
groups.
d. The maintenance of democracy requires a
predominantly urban population.
e. The maintenance of democracy requires
the separation of governmental powers.
B. improved stem
Theorists of pluralism have asserted that the
maintenance of democracy requires
a. a large middle class
b. autonomous centres of contervailing power
c. existence of a multiplicity of religious
groups
d. a predominantly urban population
e. separation of governmental powers
plausible distractors
mutually exclusive
homogenous in content, and format if
possible
free from clues about which response
is correct
all of the above” and “none of the
A framework for conceptualising
reading test validity by Weir (2005)
1. Response methods
2. Weighting
3. Knowledge of criteria
4. Order of items
5. Channel of presentation
6. Text length
7. Time constraint
1. Task Input & Output
2. Lexical resources
3. Structural resources
4. Discourse mode
5. Content knowledge
6. Cultural knowledge
7. Reader-writer relationship
What is the task format? (MCQs,
True/False, gap-fill, cloze,..)
How many items? (mininum 5)
What skill does each item test?
What is the level of difficulty of each
item? (A1-C1)
Time: 3.00-3.25pm

Examining reading

  • 1.
    EXAMINING READING NGUYEN HUYENMINH, M.A. 30-31 DECEMBER 2015
  • 2.
    1. Planning anddesigning the test 2. Defining reading 3. Writing specifications for individual task 4. Writing the test 5. Writing good MCQs
  • 3.
    Test context andpurpose specifications Specifications of the overall test structure Specifications for individual test tasks
  • 4.
    Test context andpurpose specifications ⇒How the test will be used? ⇒How results will be interpreted? ⇒Why the test is being given in the first place?
  • 5.
    Purpose(s) of thetest What decision(s) will you make based on test results? Construct(s) to be assessed and their definitions • What are the constructs, and what do you view them as including? Target language use (TLU) domain and common /important task types • What language use context(s) outside the test should test results generalize to? • What should the test tell you about what students can do in those contexts? Characteristics of test takers • Who are the people who will take your test, what are they like, and what do they probably know? Minimum acceptable levels for each of the qualities of usefulness • What is the lowest level you would be willing to accept for each one, and how do you prioritize them? Resource plan • What resources do you need to create and use this test, and what resources are available? (time, space, materials and equipment, personnel)
  • 6.
    Usefulness (Bachman &Palmer, 1996): reliability, authenticity, construct validity, impact and practicality.
  • 7.
    Construct Elements oftenincluded in construct definitions Listening Listening for the main idea, listening for major points, listening for specific details, listening for the gist, inferencing, predicting, determining the meaning of unfamiliar vocabulary from context, distinguishing fact from opinion, determining speaker’s intent, notetaking Reading Reading for the main idea, reading for major points, reading for specific details, reading for the gist, inferencing, predicting, skimming, scanning, determining the meaning of unfamiliar vocabulary from context, distinguishing fact from opinion, sensitivity to rhetorical organization of the text, sensitivity to cohesion of the text, identifying author purpose or tone, paraphrasing texts
  • 8.
    ⇒What will studentsdo when they take the test? (task format) ⇒How many questions or prompts will there be in each section of the test? (number of tasks)
  • 9.
    ⇒How the testwill be used? ⇒How results will be interpreted? ⇒Why the test is being given in the first place?
  • 10.
    Steps in theprocess 1.Review construct definitions. 2.Choose task format(s) for assessing each construct, or for each aspect of each construct. 3.Decide how many of each task (questions, passages, prompts, etc.) to use in each section.
  • 11.
    In group of3-4, take a test that you are familiar with and analyze their purpose, context and overall structure. It can be an institutional test that your faculty is running or it can be an international one such as PET, IELTS or TOEFL.
  • 12.
    What do wetest when we test reading competence?
  • 13.
    A factorial approach Asubskill approach A cognitive process approach
  • 14.
    Focuses on thedivisibility of capabilities a reader is assumed to need in tackling certain test item Defing factors that contribute the succesful reading (recalling word meanings, drawing inferences, recognising a writer’s purpose/attitude/tone, finding answers to explicit questions and following the structure of a passage) (Davis, 1968) Weaknesses: focuses on the outcome/performance in the test rather then the reading process itself
  • 15.
    Reading might besubdvided into competencies which the skilled reader is believed to have. Mainly based on informed intuition rather than emperical research
  • 16.
      Level of activity (from more simple to more complex) Readers’ typical cognitive operations in language tests Size  of typical  unit 1 Lexis: word matching Reader identifies same word in question and text Word 2 Lexis: synonym and word-class matching Reader uses knowledge of word meaning or word class to identify synonym, antonym or other related word Word 3 Grammar/syntax Reader uses grammatical knowledge to disambiguate and identify answer Clause/ sentence 4 Propositional meaning Reader uses knowledge of lexis and grammar to establish meaning of a sentence. Sentence 5 Inference Reader goes beyond literal meaning to infer a further significance Sentence/ paragraph/ text 6 Building a mental model Reader uses several features of the text to build a larger mental model Text 7 Understanding text function Reader uses genre knowledge to identify text structure and purpose Text
  • 17.
    In group of3-4, take a test that you are familiar with and analyze the underlying construct of the reading component. It can be an institutional test that your faculty is running or it can be an international one such as PET, IELTS or TOEFL.
  • 18.
    Specifications  component Comments Purpose, and  construct Taken from theoverall test specifications, which were based on the context and purpose specifications Description of the  task Descriptions of questions, prompts, or reading or listening passages (depending on the task format being used), and time allowed to complete the section. Scoring method Including: • For questions: right/wrong, or partial credit? • For limited production tasks: model answer descriptions, and essential response components • For extended production tasks: considerations the scoring rubric must address, and type of scoring rubric Sample task Example of each type of prompt, question, and reading or listening passage
  • 19.
    Studying the spec. Selectingtext(s) Writing Items Piloting test Analyzing results & revising items Piloting with a larger population Test calibration & validation
  • 21.
    Passage must beable to support questions of the sort called for in the specs. Length Topic Topical specificity Vocab (variety, frequency/familiarity, density) Syntax Genre Rhetorical mode
  • 22.
    Coh-metrix www.cohmetrix.com Coh-Metrix TextEasability Assessor http://tea.comehtrix.com http://www.readabilityformulas.com https://readability-score.com/
  • 23.
    Readibility: Means (Standard Deviations)for readability formulas and text levels Variables Beginner Intermediate Advanced Flesch-Kincaid Grade Level 8.472 (1.613) 9.656 (1.703) 10.207 (1.612) Flesch Reading Ease Score 63.978 (8.354) 58.806 (8.601) 55.506 (9.203) Coh-Metrix L2 Reading Index 19.951 (4.151) 16.076 (5.312) 12.897 (5.198)
  • 24.
    Seleted-based response: True/False; Matching;MCQs; gap-fill, cloze Constructed-based response: short answer; free writing
  • 25.
    • Whether itemsare presented in the LI or L2 • Whether “false” items need to be corrected • How long the item stems and options are • Which vocabulary and grammatical structures are to be used (or avoided), if any
  • 26.
    • Whether itemsare presented in the LI or L2 • How many distractors there are • How long the item stems and options should be • Which vocabulary and grammatical structures are to be used (or avoided), if any • Which part(s) of speech are to be used (for one word answers)
  • 27.
    • Whether itemsare presented in the LI or L2 • Whether responses are accepted in the L 1, L2, or both • Whether linguistic accuracy is included in scoring criteria (for answers in the L2) • Which vocabulary and grammatical structures are to be used (or avoided), if any
  • 28.
    • How longitem stems are • How long responses should be • What is the maximum number of pieces of information to ask for • Whether polytomous or dichotomous scoring is to be used; if polytomous, how many points per item, or per piece of information
  • 29.
    • Whether itemsare presented in the LI or L2 • How long the item stem and options are • Which vocabulary and grammatical structures areto be used (or avoided), if any • How many options there are
  • 30.
    Parts of anMCQ stem - the text of the question options - the choices provided after the stem (these include the key and the distractors) the key - the correct answer in the list of options distractors - the incorrect answers in the list of options
  • 31.
  • 32.
    1. The messageis about a change in schedule for a(n)______ . A. business meeting B. doctors appointment C. airline flight 2. The meeting time has been changed from______ . A. 8 am to 9 am B. 9 am to 10 am C. 10 am to 11 am
  • 33.
    According to thepassage, there are now over______ dance schools in Italy. A two hundred B two thousand C two hundred thousand D two million The research center uses the panda blood samples for: A creating super-pandas B research and storage C comparison with bears and cats D display purposes
  • 34.
    The options arenot all grammatically consistent with the stem. The doctor gave the patient some_______. A medicine B stethoscope C surgical
  • 35.
    The reading passageis about the_______airplane in the world. A biggest B first C oldest D fastest The engineers in the passage are trying to make a ______ airplane. A huge B fast C small D fuel-efficient
  • 36.
    Orange County: A. hasthe largest concentration of Vietnamese people in California. B. is one of the locations with the most Vietnamese residents in California. C. has the largest concentration of Vietnamese people in the U.S.
  • 37.
    The stem isnot meaningful by itself
  • 38.
    The stem shouldnot contain irrelevant material A. original stem: Paul Muldoon, an Irish postmodern poet who uses experimental and playful language, uses which poetic genre in "Why Brownlee Left"? a. sonnet b. elegy c. narrative poem d. dramatic monologue e. haiku B. improved stem Paul Muldoon uses which poetic genre in "Why Brownlee Left"? a. sonnet b. elegy c. narrative poem d. dramatic monologue e. haiku
  • 39.
    The stem shouldnot contain irrelevant material A. original stem: Paul Muldoon, an Irish postmodern poet who uses experimental and playful language, uses which poetic genre in "Why Brownlee Left"? a. sonnet b. elegy c. narrative poem d. dramatic monologue e. haiku B. improved stem Paul Muldoon uses which poetic genre in "Why Brownlee Left"? a. sonnet b. elegy c. narrative poem d. dramatic monologue e. haiku
  • 40.
    Duplicating material ineach of the options (instead of putting it in the stem) A. original stem Theorists of pluralism have asserted which of the following? a. The maintenance of democracy requires a large middle class. b. The maintenance of democracy requires autonomous centres of contervailing power. c. The maintenance of democracy requires the existence of a multiplicity of religious groups. d. The maintenance of democracy requires a predominantly urban population. e. The maintenance of democracy requires the separation of governmental powers. B. improved stem Theorists of pluralism have asserted that the maintenance of democracy requires a. a large middle class b. autonomous centres of contervailing power c. existence of a multiplicity of religious groups d. a predominantly urban population e. separation of governmental powers
  • 41.
    Duplicating material ineach of the options (instead of putting it in the stem) A. original stem Theorists of pluralism have asserted which of the following? a. The maintenance of democracy requires a large middle class. b. The maintenance of democracy requires autonomous centres of contervailing power. c. The maintenance of democracy requires the existence of a multiplicity of religious groups. d. The maintenance of democracy requires a predominantly urban population. e. The maintenance of democracy requires the separation of governmental powers. B. improved stem Theorists of pluralism have asserted that the maintenance of democracy requires a. a large middle class b. autonomous centres of contervailing power c. existence of a multiplicity of religious groups d. a predominantly urban population e. separation of governmental powers
  • 42.
    plausible distractors mutually exclusive homogenousin content, and format if possible free from clues about which response is correct all of the above” and “none of the
  • 43.
    A framework forconceptualising reading test validity by Weir (2005)
  • 44.
    1. Response methods 2.Weighting 3. Knowledge of criteria 4. Order of items 5. Channel of presentation 6. Text length 7. Time constraint
  • 45.
    1. Task Input& Output 2. Lexical resources 3. Structural resources 4. Discourse mode 5. Content knowledge 6. Cultural knowledge 7. Reader-writer relationship
  • 46.
    What is thetask format? (MCQs, True/False, gap-fill, cloze,..) How many items? (mininum 5) What skill does each item test? What is the level of difficulty of each item? (A1-C1) Time: 3.00-3.25pm

Editor's Notes

  • #6 Usefulness (Bachman & Palmer, 1996): reliability, authenticity, construct validity, impact and practicality. Validity is an integrated evaluative judgement of the degree to which empirical evidence and theoretical rationales support the adequacy and appropriateness of inferences and actions based on test scores or other modes of assessment. Messick đưa ra sáu khía cạnh của tính giá trị như sau: Khía cạnh nội dung (The content aspect) của tính giá trị bao gồm các minh chứng về sự liên quan và sự đại diện về mặt nội dung của các câu hỏi kiểm tra và chất lượng kỹ thuật của bài kiểm tra; Khía cạnh lý thuyết (The substantive aspect) của tính giá trị được dùng để chỉ các lập luận mang tính lý thuyết để giải thích cho tính nhất quán trong các câu trả lời của thí sinh mà người kiểm tra thu được; Khía cạnh cấu trúc (The structural aspect) đánh giá độ chính xác của cấu trúc điểm so với cấu trúc của lĩnh vực năng lực mà bài kiểm tra đang đo lường; Khía cạnh khái quát hóa (The generalisability aspect) kiểm tra khả năng khái quát từ các đặc điểm và các ý nghĩa của điểm số trong một bài kiểm tra trên một mẫu thí sinh này tới các mẫu thí sinh khác, vào các bối cảnh kiểm tra khác, và với các câu hỏi kiểm tra khác, bao gồm cả việc khái quát về năng lực của các mối tương quan giữa điểm bài kiểm tra và điểm tiêu chí. Khía cạnh này bao trùm khái niệm độ tin cậy của điểm bài kiểm tra; Khía cạnh bên ngoài (The external aspect) gồm các minh chứng về độ hội tụ và độ phân biệt của các phép so sánh nhiều năng lực – nhiều phương pháp cũng như các minh chứng về sự liên quan của tiêu chí và sự hữu ích của việc sử dụng điểm số; và Khía cạnh hệ quả (The consequential aspect) đánh giá các ngụ ý về giá trị của điểm số, các hệ quả thực tế hay hậu quả tiềm tàng của việc sử dụng bài kiểm tra, đặc biệt trong các vấn đề về tính công bằng trong kiểm tra đánh giá. (Messick 1995).