Assessment grounding

• by Standards, Frameworks and Pace. Review assessment schedule, background info
• A standard multiple-choice test item consists of two basic parts: a problem ( stem) and a list of suggested solutions ( alternatives). The stem may be in the form of either a question or an incomplete statement, and the list of alternatives contains one correct or best alternative ( answer) and a number of incorrect or inferior alternatives ( distractors). The purpose of the distractors is to appear as plausible solutions to the problem for those students who have not achieved the objective being measured by the test item. Conversely, the distractors must appear as implausible solutions for those students who have achieved the objective. Only the answer should appear plausible to these students.
• Although they are less susceptible to guessing than are true false-test items, multiple-choice items are still affected to a certain extent. This guessing factor reduces the reliability of multiple-choice item scores somewhat, but increasing the number of items on the test offsets this reduction in reliability. The following table illustrates this principle. For example, if your test includes a section with only two multiple-choice items of 4 alternatives each ( a b c d), you can expect 1 out of 16 of your students to correctly answer both items by guessing blindly. On the other hand if a section has 15 multiple-choice items of 4 alternatives each, you can expect only 1 out of 8,670 of your students to score 70% or more on that section by guessing blindly.
• The stem is the foundation of the item. After reading the stem, the student should know exactly what the problem is and what he or she is expected to do to solve it. If the student has to infer what the problem is, the item will likely measure the student’s ability to draw inferences from vague descriptions rather than his or her achievement of a course objective.
• Rather than repeating redundant words or phrases in each of the alternatives, place such material in the stem to decrease the reading burden and more clearly define the problem in the stem. Notice how the underlined words are repeated in each of the alternatives in the poor example above. This problem is fixed in the better example, where the stem has been reworded to include the words common to all of the alternatives
• The stem of the poor example above is excessively long for the problem it is presenting. The stem of the better example has been reworded to exclude most of the irrelevant material, and is less than half as long. Research. Several studies have indicated that including irrelevant material in the item stem decreases both the reliability and the validity of the resulting test scores (Haladyna &amp; Downing, 1989b).
• Negatively-worded items are those in which the student is instructed to identify the exception, the incorrect answer, or the least correct answer. Such items are frequently used, because they are relatively easy to construct. The teacher writing the item need only come up with one distractor, rather than the two to four required for a positively-worded item. Positive items, however, are more appropriate to use for measuring the attainment of most educational objectives. For most educational objectives, a student’s achievement is more effectively measured by having him or her identify a correct answer rather than an incorrect answer. Just because the student knows an incorrect answer does not necessarily imply that he or she knows the correct answer. For this reason, items of the negative variety are not recommended for general use.
• The alternatives in the poor example above are rather wordy, and may require more than one reading before the student understands them clearly. In the better example, the alternatives have been streamlined to increase clarity without losing accuracy.
• Alternatives that overlap create undesirable situations. Some of the overlapping alternatives may be easily identified as distractors. On the other hand, if the overlap includes the intended answer, there may be more than one alternative that can be successfully defended as being the answer. In the poor example above, alternatives a and d overlap, as do alternatives b and c. In the better example, the alternatives have been rewritten to be mutually exclusive.
• If the alternatives consist of a potpourri of statements related to the stem but unrelated to each other, the student’s task becomes unnecessarily confusing. Alternatives that are parallel in content help the item present a clear-cut problem more capable of measuring the attainment of a specific objective. The poor example contains alternatives testing knowledge of state agriculture, physical features, flags, and nicknames. If the student misses the item, it does not tell the teacher in which of the four areas the student is weak. In the better example, all of the alternatives refer to state agriculture, so if the student misses the item, it tells the teacher that the student has a weakness in that area.
• The word “an” in the stem of the poor example above serves as a clue to the correct answer, “ adjective,” because the other alternatives begin with consonants. The problem has been corrected in the better example by placing the appropriate article, “an” or “a,” in each alternative.
• In the poor example above, the answer fits better grammatically with the stem than do the distractors. This problem has been solved in the better example by rewording the alternatives. Research. Several studies have found that grammatical clues make items easier (Haladyna &amp; Downing, 1989b).
• The answer in the poor example above stands out because it does not include the identical wording underlined in each of the distractors. The answer is less obvious in the better example because the distractors have been reworded to be more parallel with the answer.
• Notice how the answer stands out in the poor example above. Both the answer and one of the distractors have been reworded in the better example to make the alternative lengths more uniform. Research. Numerous studies have indicated that items are easier when the answer is noticeably longer than the distractors when all of the alternatives are similar in length (Haladyna &amp; downing, 1989b).
• The answer in the poor example above is a familiar definition straight out of the textbook, and the distractors are in the teacher’s own words.
• In the poor example above, the underlined word in each of the distractors is a specific determiner. These words have been removed from the better example by rewording both the stem and the distractors.
• In the poor example above, the underlined word “journal” appears in both the stem and the answer. This clue has been removed from the better example by replacing the answer with another valid answer that does not include the keyword. Research. Several studies have reported that items are easier when a keyword in the stem is also included in the answer (Haladyna &amp; Downing, 1989b).
• The implausible distractors in the poor example have been replaced by more plausible distractors in the better example. Plausible distractors may be created in several ways, a few of which are listed below: • Use common student misconceptions as distractors. The incorrect answers supplied by students to a short answer version of the same item are a good source of material to use in constructing distractors for a multiple-choice item. • Develop your own distractors, using words that “ring a bell” or that “sound official.” Your distractors should be plausible enough to keep the student who has not achieved the objective from detecting them, but not so subtle that they mislead the student who has achieved the objective.
• These two alternatives are frequently used when the teacher writing the item has trouble coming up with a sufficient number of distractors. Such teachers emphasize quantity of distractors over quality. Unfortunately, the use of either of these alternatives tends to reduce the effectiveness of the item, as illustrated in the following table: Research. While research on the use of “all of the above” is not conclusive, the use of “none of the above” has been found in several studies to decrease item discrimination and test score reliability (Haladyna &amp; Downing, 1989b).
• Such ambiguity is particularly a problem with items of the best answer variety, where more than one alternative may be correct, but only one alternative should be clearly best. If competent authorities cannot agree on which alternative is clearly best, the item should either be revised or discarded. While the answer to the poor example above is a matter of debate, the underlined phrase added to the better example clarifies the problem considerably and rules out all of the alternatives except the answer.
• The easiest method of randomizing the answer position is to arrange the alternatives in some logical order. The following table gives examples of three logical orders. The best order to use for a particular item depends on the nature of the item’s alternatives.
• Research. Although very little research has been done on this guideline, one study has reported that simplifying the vocabulary makes the items about 10% easier (Cassels &amp; Johnstone, 1984).
HSD2 Assessment Question Creation
The Objective: Participants will be able to construct reliable and valid test items.
The Task: Create question items for CBM 2
Items include Multiple Choice, Constructed Response, ECR and SCR prompts.
Write multiple items for each framework or evidence outcome to be assessed (giving priority to RED and BLUE frameworks or evidence outcomes).
What to Assess?
All content that is identified in the assessment column available for CBM 2.
1/3 of questions from Jan. 4 – Feb. 4, 2/3 of questions from Feb. 7 – May 13
Give priority to RED and BLUE frameworks and/or evidence outcomes.
Assessment Question Type Grade Level/Content Assessment Question Type Assessment K-12 Specials & Electives (Visual Arts, Performing Arts, PE) Multiple Choice CBM 2 K-12 English/Language Arts (Reading) Multiple Choice – frameworks that are not passage specific CBM 2 K-12 English/Language Arts (Writing) Multiple Choice & SCR (K-3)/ECR (4-12) Prompt CBM 2 K-12 Math Multiple Choice & CR CBM 2 K-5 Science Multiple Choice CBM 2 6-12 Science Multiple Choice & CR CBM 2 6-12 Social Studies Multiple Choice & SCR End of Course Exam
Developed from Brigham Young University Testing Services and The Department of Instructional Science How to Prepare Better Multiple-Choice Test Items: Guidelines for Teachers
5. 5. Anatomy of a Multiple-Choice Item
6. 6. Reliability
Difficulty of Construction
Good multiple-choice test items are generally more difficult and time-consuming to write than other types of test items. Coming up with plausible distractors requires a certain amount of skill. This skill, however, may be increased through study, practice, and experience.
Guidelines for Constructing Multiple-Choice Items
1. Construct each item to assess a single written lesson objective.
Items that are not written with a specific objective in mind often end up measuring lower-level objectives exclusively, or covering trivial material that is of little educational worth.
9. 9. 2. Base each item on a specific problem stated clearly in the stem using proper grammar, punctuation, and spelling.
10. 10. As illustrated in the following examples, the stem may consist of either a direct question or an incomplete sentence, whichever presents the problem more clearly and concisely.
11. 11. 3. Include as much of the item as possible in the stem, but do not include irrelevant material.
12. 12. Excess material in the stem that is not essential to answering the problem increases the reading burden and adds to student confusion over what he or she is being asked to do.
13. 13. 4. State the stem in positive form (in general).
14. 14. 5. Word the alternatives clearly and concisely. Clear wording reduces student confusion, and concise wording reduces the reading burden placed on the student.
15. 15. 6. Keep the alternatives mutually exclusive.
16. 16. 7. Keep the alternatives homogeneous in content
17. 17. 8.1 Keep the grammar of each alternative consistent with the stem. Students often assume that inconsistent grammar is the sign of a distractor, and they are generally right.
18. 19. 8.2 Keep the alternatives parallel in form. If the answer is worded in a certain way and the distractors are worded differently, the student may take notice and respond accordingly.
19. 20. 8.3 Keep the alternatives similar in length. An alternative noticeably longer or shorter than the other is frequently assumed to be the answer, and not without good reason.
20. 21. 8.4 Avoid textbook, verbatim phrasing. If the answer has been lifted word-for-word from the pages of the textbook, the students may recognize the phrasing and choose correctly out of familiarity rather than achievement.
21. 22. 8.5 Avoid the use of specific determiners. When words such as never, always, and only are included in distractors in order to make them false, they serve as flags to the alert student.
22. 23. 8.6 Avoid including keywords in the alternatives. When a word or phrase in the stem is also found in one of the alternatives, it tips the student off that the alternative is probably the answer.
23. 24. 8.7 Use plausible distractors. For the student who does not possess the ability being measured by the item, the distractors should look as plausible as the answer. Unrealistic or humorous distractors are nonfunctional and increase the student’s chance of guessing the correct answer.
24. 25. 9. Avoid the alternatives “all of the above” and “none of the above” (in general).
25. 26. 10. Include one and only one correct or clearly best answer in each item. When more than one of the alternatives can be successfully defended as being the answer, responding to an item becomes a frustrating game of determining what the teacher had in mind when he or she wrote the item.
26. 27. 11. Present the answer in each of the alternative positions approximately an equal number of times, in a random order.
27. 28. 12. Avoid using unnecessarily difficult vocabulary. <ul><ul><li>If the vocabulary is somewhat difficult, the item will likely measure reading ability in addition to the achievement of the objective for which the item was written. As a result, poor readers who have achieved the objective may receive scores indicating that they have not. </li></ul></ul><ul><ul><li>Use difficult and technical vocabulary only when essential for measuring the objective. </li></ul></ul>
28. 29. 13. Avoid Question Bias – Be aware of questions that can: a) be construed as offensive to particular groups of individuals, b) portray groups of individuals unfavorably, or in a stereotypical fashion, c) be advantageous to one group, and/or disadvantageous to another, or d) be unfamiliar to certain groups of individuals.
Guidelines For Writing Content SCRs
Write questions/prompts that can be answered in one paragraph.
Target Content, Details, and Vocabulary sections of the scoring rubric need to be explicit. (see example on the next slide)
30. 31. Example of Content Scoring Rubric
Constructed Response Questions
Each question needs to be aligned to a framework or evidence outcome from the district curriculum map.
Give priority to RED and BLUE frameworks and/or evidence outcomes.
Each question may contain multiple parts, but be worth no more than nine points total.
Answer must clearly detail how each point is earned.
32. 33. Example of CR Scoring Guide
What to Assess?
All content that is identified in the assessment column available for CBM 2.
1/3 of questions from Jan. 4 – Feb. 4, 2/3 of questions from Feb. 7 – May 13
Give priority to RED and BLUE frameworks and/or evidence outcomes.
Assessment Question Type Grade Level/Content Assessment Question Type Assessment K-12 Specials & Electives (Visual Arts, Performing Arts, PE) Multiple Choice CBM 2 K-12 English/Language Arts (Reading) Multiple Choice – frameworks that are not passage specific CBM 2 K-12 English/Language Arts (Writing) Multiple Choice & SCR (K-3)/ECR (4-12) Prompt CBM 2 K-12 Math Multiple Choice & CR CBM 2 K-5 Science Multiple Choice CBM 2 6-12 Science Multiple Choice & CR CBM 2 6-12 Social Studies Multiple Choice & SCR End of Course Exam
When to Add Images
Images should only be added when required by the question.
Images should be black and white and of high resolution (300 dpi).
Images should not be from a copyrighted source.
Images are not be used for mere "decoration."
36. 37. How To Add ImagesTo Questions You Submit
37. 38. How To Add ImagesTo Questions You Submit
38. 39. How To Add ImagesTo Questions You Submit
39. 40. You Are Ready to Submit