1. The notion of sub-skills in reading
comprehension tests:
An EAP example.
Lumley, T. (1993)
Cindy
2012/11/7
2. Outline
I. Perceptions of reading subskills in
ESL
II. The relationship between reading
subskills and test items
III. The study
IV. Results: Difficulty of subskills
V. Rasch IRT analysis and reading
subskills
VII. Discussion
VIII. Conclusion
3. I. Perceptions of
reading subskills in ESL
The divisibility of reading comprehension
into discrete subskills (e.g., Bloom, 1956;
Gary, 1960; Davis, 1968; Munby, 1978).
1. Reading subskills in syllabus for ESL
Munby’s (1978) framework for specifying
ESP syllabus content, including its
extensive list of language microskills, has
been strongly criticized.
need analysis
4. I. Perceptions of
reading subskills in ESL (Cont.)
2. Subskills in test construction
Carroll’s (1980) identification of lg skills
From Munby’s taxonomy of 54 lg skills, listing
11 skills as suitable for testing.
Hughes’ (1989) identification of two levels of
subskills…
(1)Macroskills: understanding the ideas in the text
(info., gist, argument)
(2)Microskills: recognizing and interpreting the
more linguistic features of the text (referents,
word meanings, discourse indicators)
5. I. Perceptions of
reading subskills in ESL (Cont.)
What is still unclear is HOW teachers
-- identify microskills
-- be involved in constructing tests, and
What sort of reliability and validity might
be attached to Ts’ judgments.
6. II. The relationship between
reading subskills and test items
In Alderson and Lukmani’s (1989) study, Ts….
(1) showed relatively little agreement about the
subskills tested by a range of reading
comprehension test items;
(2) disagreed over the order of cognitive abilities
demanded by the same item;
(3) classified lower/higher achievers based on the
test, but higher on the test requiring higher-
order cognitive skills
Cognitive levels were unrelated to levels of
linguistic proficiency.
7. II. The relationship between
reading subskills and test items
(Cont.)
low discrimination
not knowing why the judgments made
the choices they did with regard to the
skills tested by test items
The need for making explicit the
interpretations of the subskills described.
8. III. The study
Research questions:
1) Does a group of 5 experienced EAP Ts
perceive a common hierarchy of difficulty
among the subskills?
2) Is it possible for the same group of Ts to
reach agreement upon subskills tested by
individual test items in a test of reading
comprehension?
9. 1. The test
EAP
Non-English-speaking background Ss
A university reading test with 2 texts,
total length 1500 words
58 items
Item types: short answer, multiple choice,
matching, T/F, completing a flow-chart,
labeling a map
10. 2. The subjects
3 groups of NNS (n = 158)
1) Oversea Ss (n = 90)
2) Ss from a language center (n = 50)
3) Preparing postgraduate qualification in
business administration (n = 18)
?? Unequal number of participants in each group
11. 3. Test analysis
Rasch analysis, using the program of QUEST
showed items misfitting (square values above
the acceptable limit of 1.3).
12. 4. Procedure
To establish a common interpretation of
subskills descriptions and criteria
A post hoc content analysis
13. 5. Existing lists of subskills
Munby’s (1978) framework
19 reading microskills were examined
14. 6. Final selection of test items for
analysis
22 items
difficulty (logit values: -1.875 to 1.875)
discrimination levels (classical analysis in
the range 0.55 to 0.97)
facility levels (0.89 to 0.25)
15. 7. Development of the list of subskills
To develop the wording of
subskill description for those analyzing
the test
Final version: 22 items
16. 8. The raters
qualified ESL/EFL teachers
at least five years’ experience
MA degree in applied linguistics
involved in language test construction
the group included the two test developers
completed the reading test before the
rating session
17. 9. The rating session
Procedure
1) Perceived difficulty on a 4-point scale
(A-D; with A, representing the easiest)
2) They rated the selected items on the same scale.
3) Selected the single, highest-level skill from the lists of
subskills
4) Each person allocated a subskill to the item.
To establish a level of agreement about the procedure
and the interpretation of the subskill description, this
process was repeated for 3 more items, of varying of
difficulty.
5) Subskills were then matched to the remaining items
by each group member.
18. 9. The rating session (Cont.)
1) The importance of determining the
focus of each subskill
• Subskill 4. Explaining a fact with:
4.1 a single clause
4.2 multiple clauses
• Subskill 6. Analysis of the elements within a process,
to examine methodically their causal/sequential
relationship
The difference was initially unclear.
19. 9. The rating session (Cont.)
2) The concept of one or more subksills
was necessary but not sufficient for
answering Qs
• Subskill 1 ‘Dealing with relatively uncommon
vocabulary: matching of words/phrases referred to in
text with given equivalent meanings’
-- impossible to describe or measure it
20. 9. The rating session (Cont.)
3) Some subskills would occur at several or
all levels
• Subskill 5: ‘Selecting a phrase as summarizing the main
topic of a text’
4) Two subskills listed as important in
reading comprehension, skimming and
scanning were needed repeatedly
throughout the test, but could not be
identified as central to particular items.
21. 9. The rating session (Cont.)
5) To alter the wording of some subskills
To add a 9th subskill to the existing list
No level of perceived difficulty was identified
6) Potential confusion remained between
subskill 9, ‘understanding grammatical
and semantic reference’ and
subskill 3 ‘identification of information in
the text.’
22. IV. Results: Difficulty of subskills
the range of
rating given
Bold
represents
cases where
80% or greater
agreement
For 11 of the 14 subskills is seen to be substantial
agreement about inherent level of difficulty.
23. No guidelines were given to the group
as to how the raters should interpret the
levels A to D.
24. IV. Results: Reading subskills
matched
to each item
For item 6,
although the
skill required
was seen as
subskill 2,
agreement was
not reached as
to whether this
was subskill 2.1
or 2.2.
5 raters were able to reach almost complete agreement on which the skill
to answer the 22 items.
25. V. Rasch IRT analysis and reading
subskills
1. Use of IRT in tests of reading comprehension
Rasch analysis in language testing is unique in
mapping student ability and item difficulty on
the same scale.
Resulting analyses of strengths and weaknesses
of both individuals and groups have the
potential to provide useful guidance for
teachers in planning their teaching,
e.g., Tests of Reading Comprehension for
children in primary schools (TORCH) scale of
representing particular reading subskills.
26. V. Rasch IRT analysis and reading
subskills (Cont.)
2. IRT analysis as validation of teacher perception
Q: Do items identified by the group of
teachers as requiring the same subskills
occur at roughly the same level of difficulty
as each other, according to the Rasch
analysis?
29. VI. Summary of findings
1) Teachers have a high level of agreement
about subskills tested by particular test items,
and they also share common perceptions
about the relevant difficulty of the subskills.
2) A significant correlation between teachers’
perceptions of the difficulty of each subskill
and the logit values obtained from the IRT
analysis for items identified as testing as same
skills. The subskills are seem to fit into broad
bands of increasing difficulty.
30. VII. Discussion
Not practicable to examine all items in the
reading test
Use larger, complete sets of test items and
subskill description
Limited generalizability because of the
relationship between question difficulty,
subskills and text properties.
The test should be composed of reading texts
similar to those commonly encountered in the
final year or two of high school.
31. Uncontrollable test-taking process
Introspective studies of test-takers’ behavior could
establish whether or not the results of this study
are supported by the test-takers themselves
The influence of test method facets (Bachman, 1990):
To what extent do the item type and formulation of
the Q affect reader performance, and to what
extent is performance determined by the text itself?
Employing various testing methods of the same
texts, or parallel methods with different types and
leveled texts.
32. VII. Discussion
Figure 1
?? One skill had to be fully acquired before the
next could be mastered
-- Gradually emerging mastery of linguistic skills
of increasing difficulty, the ability increases
(Griffin & Nix, 1991)
?? How widely the bands in Figure 1 may
extend
33. VIII. Conclusion
RQ 1) Does a group of 5 experienced EAP
Ts perceive a common hierarchy of
difficulty among the subskills?
After brief discussion of the use of Rasch
IRT in analysis of reading comprehension
test items, the Ts’ consensus regarding
subskill difficulty level is compared to the
Rasch analysis of item difficulty, and the
significant correlation found gives some
empirical validation to the Ts’
34. VIII. Conclusion (Cont.)
RQ 2) Is it possible for the same group of
Ts to reach agreement upon subskills
tested by individual test items in a test of
reading comprehension?
A high level of concordance between
raters’ perceptions
35. Implications
The value of using Ts’ judgments in
examining test content, and the
procedure in test development, involving
mapping skills from test content.
The judgments Ts make about Linguistic
matters in test design and content validity
also have significant for teaching.
36. Reflection
The diagnostic value of a subskill analysis
of test performance, the information
yielded by the identification of any subskill
as inadequately developed in a group of
Ss could signal to a T a useful area of
work as a focus for teaching.