Sub-skills in reading comprehension tests

  1. 1. The notion of sub-skills in readingcomprehension tests:An EAP example.Lumley, T. (1993)Cindy2012/11/7
  2. 2. OutlineI. Perceptions of reading subskills in ESLII. The relationship between reading subskills and test itemsIII. The studyIV. Results: Difficulty of subskillsV. Rasch IRT analysis and reading subskillsVII. DiscussionVIII. Conclusion
  3. 3. I. Perceptions of reading subskills in ESLThe divisibility of reading comprehension into discrete subskills (e.g., Bloom, 1956; Gary, 1960; Davis, 1968; Munby, 1978).1. Reading subskills in syllabus for ESLMunby’s (1978) framework for specifying ESP syllabus content, including its extensive list of language microskills, has been strongly criticized. need analysis
  4. 4. I. Perceptions of reading subskills in ESL (Cont.)2. Subskills in test construction Carroll’s (1980) identification of lg skills From Munby’s taxonomy of 54 lg skills, listing 11 skills as suitable for testing.Hughes’ (1989) identification of two levels of subskills…(1)Macroskills: understanding the ideas in the text (info., gist, argument)(2)Microskills: recognizing and interpreting the more linguistic features of the text (referents, word meanings, discourse indicators)
  5. 5. I. Perceptions of reading subskills in ESL (Cont.)What is still unclear is HOW teachers-- identify microskills-- be involved in constructing tests, and What sort of reliability and validity might be attached to Ts’ judgments.
  6. 6. II. The relationship between reading subskills and test items In Alderson and Lukmani’s (1989) study, Ts….(1) showed relatively little agreement about the subskills tested by a range of reading comprehension test items;(2) disagreed over the order of cognitive abilities demanded by the same item;(3) classified lower/higher achievers based on the test, but higher on the test requiring higher- order cognitive skills Cognitive levels were unrelated to levels of linguistic proficiency.
  7. 7. II. The relationship between reading subskills and test items (Cont.) low discrimination not knowing why the judgments made the choices they did with regard to the skills tested by test items The need for making explicit the interpretations of the subskills described.
  8. 8. III. The studyResearch questions:1) Does a group of 5 experienced EAP Ts perceive a common hierarchy of difficulty among the subskills?2) Is it possible for the same group of Ts to reach agreement upon subskills tested by individual test items in a test of reading comprehension?
  9. 9. 1. The test EAPNon-English-speaking background SsA university reading test with 2 texts, total length 1500 words58 itemsItem types: short answer, multiple choice, matching, T/F, completing a flow-chart, labeling a map
  10. 10. 2. The subjects3 groups of NNS (n = 158)1) Oversea Ss (n = 90)2) Ss from a language center (n = 50)3) Preparing postgraduate qualification in business administration (n = 18)?? Unequal number of participants in each group
  11. 11. 3. Test analysisRasch analysis, using the program of QUESTshowed items misfitting (square values abovethe acceptable limit of 1.3).
  12. 12. 4. Procedure To establish a common interpretation of subskills descriptions and criteriaA post hoc content analysis
  13. 13. 5. Existing lists of subskillsMunby’s (1978) framework19 reading microskills were examined
  14. 14. 6. Final selection of test items foranalysis22 items difficulty (logit values: -1.875 to 1.875) discrimination levels (classical analysis in the range 0.55 to 0.97) facility levels (0.89 to 0.25)
  15. 15. 7. Development of the list of subskillsTo develop the wording of subskill description for those analyzing the testFinal version: 22 items
  16. 16. 8. The raters qualified ESL/EFL teachers at least five years’ experience MA degree in applied linguistics involved in language test construction the group included the two test developers completed the reading test before the rating session
  17. 17. 9. The rating session Procedure1) Perceived difficulty on a 4-point scale (A-D; with A, representing the easiest)2) They rated the selected items on the same scale.3) Selected the single, highest-level skill from the lists of subskills4) Each person allocated a subskill to the item. To establish a level of agreement about the procedure and the interpretation of the subskill description, this process was repeated for 3 more items, of varying of difficulty.5) Subskills were then matched to the remaining items by each group member.
  18. 18. 9. The rating session (Cont.)1) The importance of determining the focus of each subskill• Subskill 4. Explaining a fact with: 4.1 a single clause 4.2 multiple clauses• Subskill 6. Analysis of the elements within a process, to examine methodically their causal/sequential relationship The difference was initially unclear.
  19. 19. 9. The rating session (Cont.)2) The concept of one or more subksills was necessary but not sufficient for answering Qs• Subskill 1 ‘Dealing with relatively uncommon vocabulary: matching of words/phrases referred to in text with given equivalent meanings’ -- impossible to describe or measure it
  20. 20. 9. The rating session (Cont.)3) Some subskills would occur at several or all levels• Subskill 5: ‘Selecting a phrase as summarizing the main topic of a text’4) Two subskills listed as important in reading comprehension, skimming and scanning were needed repeatedly throughout the test, but could not be identified as central to particular items.
  21. 21. 9. The rating session (Cont.)5) To alter the wording of some subskills To add a 9th subskill to the existing list No level of perceived difficulty was identified6) Potential confusion remained between subskill 9, ‘understanding grammatical and semantic reference’ and subskill 3 ‘identification of information in the text.’
  22. 22. IV. Results: Difficulty of subskills the range of rating givenBoldrepresentscases where80% or greateragreement For 11 of the 14 subskills is seen to be substantial agreement about inherent level of difficulty.
  23. 23. No guidelines were given to the groupas to how the raters should interpret thelevels A to D.
  24. 24. IV. Results: Reading subskills matched to each itemFor item 6,although theskill requiredwas seen assubskill 2,agreement wasnot reached asto whether thiswas subskill 2.1or 2.2. 5 raters were able to reach almost complete agreement on which the skill to answer the 22 items.
  25. 25. V. Rasch IRT analysis and readingsubskills1. Use of IRT in tests of reading comprehension Rasch analysis in language testing is unique in mapping student ability and item difficulty on the same scale. Resulting analyses of strengths and weaknesses of both individuals and groups have the potential to provide useful guidance for teachers in planning their teaching, e.g., Tests of Reading Comprehension for children in primary schools (TORCH) scale of representing particular reading subskills.
  26. 26. V. Rasch IRT analysis and readingsubskills (Cont.)2. IRT analysis as validation of teacher perception Q: Do items identified by the group of teachers as requiring the same subskills occur at roughly the same level of difficulty as each other, according to the Rasch analysis?
  27. 27. Figure 1. Item logit values and skill difficulty levels
  28. 28. VI. Summary of findings 1) Teachers have a high level of agreement about subskills tested by particular test items, and they also share common perceptions about the relevant difficulty of the subskills. 2) A significant correlation between teachers’ perceptions of the difficulty of each subskill and the logit values obtained from the IRT analysis for items identified as testing as same skills. The subskills are seem to fit into broad bands of increasing difficulty.
  29. 29. VII. Discussion Not practicable to examine all items in the reading test Use larger, complete sets of test items and subskill description Limited generalizability because of the relationship between question difficulty, subskills and text properties. The test should be composed of reading texts similar to those commonly encountered in the final year or two of high school.
  30. 30.  Uncontrollable test-taking process Introspective studies of test-takers’ behavior could establish whether or not the results of this study are supported by the test-takers themselves The influence of test method facets (Bachman, 1990): To what extent do the item type and formulation of the Q affect reader performance, and to what extent is performance determined by the text itself? Employing various testing methods of the same texts, or parallel methods with different types and leveled texts.
  31. 31. VII. DiscussionFigure 1 ?? One skill had to be fully acquired before the next could be mastered -- Gradually emerging mastery of linguistic skills of increasing difficulty, the ability increases (Griffin & Nix, 1991) ?? How widely the bands in Figure 1 may extend
  32. 32. VIII. ConclusionRQ 1) Does a group of 5 experienced EAP Ts perceive a common hierarchy of difficulty among the subskills? After brief discussion of the use of Rasch IRT in analysis of reading comprehension test items, the Ts’ consensus regarding subskill difficulty level is compared to the Rasch analysis of item difficulty, and the significant correlation found gives some empirical validation to the Ts’
  33. 33. VIII. Conclusion (Cont.)RQ 2) Is it possible for the same group ofTs to reach agreement upon subskillstested by individual test items in a test ofreading comprehension? A high level of concordance between raters’ perceptions
  34. 34. Implications The value of using Ts’ judgments in examining test content, and the procedure in test development, involving mapping skills from test content. The judgments Ts make about Linguistic matters in test design and content validity also have significant for teaching.
  35. 35. ReflectionThe diagnostic value of a subskill analysis of test performance, the information yielded by the identification of any subskill as inadequately developed in a group of Ss could signal to a T a useful area of work as a focus for teaching.
