Upcoming SlideShare
×

# Item analysis

6,074 views

Published on

Published in: Business, Economy & Finance
3 Likes
Statistics
Notes
• Full Name
Comment goes here.

Are you sure you want to Yes No
• Be the first to comment

Views
Total views
6,074
On SlideShare
0
From Embeds
0
Number of Embeds
2
Actions
Shares
0
115
0
Likes
3
Embeds 0
No embeds

No notes for slide

### Item analysis

1. 1. Item Analysis Table of Contents Major Uses of Item Analysis Item Analysis Reports Item Analysis Response Patterns Basic Item Analysis Statistics Interpretation of Basic Statistics Other Item Statistics Summary Data Report Options Item Analysis Guidelines Major Uses of Item Analysis Item analysis can be a powerful technique available to instructors for the guidance and improvement of instruction. For this to be so, the items to be analyzed must be valid measures of instructional objectives. Further, the items must be diagnostic, that is, knowledge of which incorrect options students select must be a clue to the nature of the misunderstanding, and thus prescriptive of appropriate remediation. In addition, instructors who construct their own examinations may greatly improve the effectiveness of test items and the validity of test scores if they select and rewrite their items on the basis of item performance data. Such data is available to instructors who have their examination answer sheets scored at the Computer Laboratory Scoring Office. [ Top ] Item Analysis Reports As the answer sheets are scored, records are written which contain each student's score and his or her response to each item on the test. These records are then processed and an item analysis report file is generated. An instructor may obtain test score distributions and a list of students' scores, in alphabetic order, in student number order, in percentile rank order, and/or in order of percentage of total points. Instructors are sent their item analysis reports from as e-mail attacments. The item analysis report is contained in the file IRPT####.RPT, where the four digits indicate the instructors's GRADER III file. A sample of an individual long form item analysis lisitng is shown below. Upper 27% Middle 46% Lower 27% Total Item 10 of 125. The correct option is 5. Item Response Pattern 1 2 3 4 5 Omit 2 8 0 1 19 0 7% 27% 0% 3% 63% 0% 3 20 3 3 23 0 6% 38% 6% 6% 44% 0% 6 5 8 2 9 0 20% 17% 27% 7% 30% 0% 11 33 11 6 51 0 Error 0 0% 0 0% 0 0% 0 Total 30 100% 52 100% 30 101% 112
2. 2. 10% 29% 11% 5% 46% 0% 0% 100% [ Top ] Item Analysis Response Patterns Each item is identified by number and the correct option is indicated. The group of students taking the test is divided into upper, middle and lower groups on the basis of students' scores on the test. This division is essential if information is to be provided concerning the operation of distracters (incorrect options) and to compute an easily interpretable index of discrimination. It has long been accepted that optimal item discrimination is obtained when the upper and lower groups each contain twenty-seven percent of the total group. The number of students who selected each option or omitted the item is shown for each of the upper, middle, lower and total groups. The number of students who marked more than one option to the item is indicated under the "error" heading. The percentage of each group who selected each of the options, omitted the item, or erred, is also listed. Note that the total percentage for each group may be other than 100%, since the percentages are rounded to the nearest whole number before totaling. The sample item listed above appears to be performing well. About two-thirds of the upper group but only one-third of the lower group answered the item correctly. Ideally, the students who answered the item incorrectly should select each incorrect response in roughly equal proportions, rather than concentrating on a single incorrect option. Option two seems to be the most attractive incorrect option, especially to the upper and middle groups. It is most undesirable for a greater proportion of the upper group than of the lower group to select an incorrect option. The item writer should examine such an option for possible ambiguity. For the sample item on the previous page, option four was selected by only five percent of the total group. An attempt might be made to make this option more attractive. Item analysis provides the item writer with a record of student reaction to items. It gives us little information about the appropriateness of an item for a course of instruction. The appropriateness or content validity of an item must be determined by comparing the content of the item with the instructional objectives. [ Top ] Basic Item Analysis Statistics A number of item statistics are reported which aid in evaluating the effectiveness of an item. The first of these is the index of difficulty which is the proportion of the total group who got the item wrong. Thus a high index indicates a difficult item and a low index indicates an easy item. Some item analysts prefer an index of difficulty which is the proportion of the total group who got an item right. This index may be obtained by marking the PROPORTION RIGHT option on the item analysis header sheet. Whichever index is selected is shown as the INDEX OF DIFFICULTY on the item analysis print-out. For classroom achievement tests, most test constructors desire items with indices of difficulty no lower than 20 nor higher than 80, with an average index of difficulty from 30 or 40 to a maximum of 60. The INDEX OF DISCRIMINATION is the difference between the proportion of the upper group who got an item right and the proportion of the lower group who got the item right. This index is dependent upon the difficulty of an item. It may reach a maximum value of 100 for an item with an index of difficulty of 50, that is, when 100% of the upper group and none of the lower group answer the item correctly. For items of less than or greater than 50 difficulty, the index of discrimination has a maximum value of less than 100. The Interpreting the Index of Discrimination document contains a more detailed discussion of the index of discrimination. [ Top ]
3. 3. Interpretation of Basic Statistics To aid in interpreting the index of discrimination, the maximum discrimination value and the discriminating efficiency are given for each item. The maximum discrimination is the highest possible index of discrimination for an item at a given level of difficulty. For example, an item answered correctly by 60% of the group would have an index of difficulty of 40 and a maximum discrimination of 80. This would occur when 100% of the upper group and 20% of the lower group answered the item correctly. The discriminating efficiency is the index of discrimination divided by the maximum discrimination. For example, an item with an index of discrimination of 40 and a maximum discrimination of 50 would have a discriminating efficiency of 80. This may be interpreted to mean that the item is discriminating at 80% of the potential of an item of its difficulty. For a more detailed discussion of the maximum discrimination and discriminating efficiency concepts, see the Interpreting the Index of Discrimination document. [ Top ] Other Item Statistics Some test analysts may desire more complex item statistics. Two correlations which are commonly used as indicators of item discrimination are shown on the item analysis report. The first is the biserial correlation, which is the correlation between a student's performance on an item (right or wrong) and his or her total score on the test. This correlation assumes that the distribution of test scores is normal and that there is a normal distribution underlying the right/wrong dichotomy. The biserial correlation has the characteristic, disconcerting to some, of having maximum values greater than unity. There is no exact test for the statistical significance of the biserial correlation coefficient. The point biserial correlation is also a correlation between student performance on an item (right or wrong) and test score. It assumes that the test score distribution is normal and that the division on item performance is a natural dichotomy. The possible range of values for the point biserial correlation is +1 to -1. The Student's t test for the statistical significance of the point biserial correlation is given on the item analysis report. Enter a table of Student's t values with N - 2 degrees of freedom at the desired percentile point N, in this case, is the total number of students appearing in the item analysis. The mean scores for students who got an item right and for those who got it wrong are also shown. These values are used in computing the biserial and point biserial coefficients of correlation and are not generally used as item analysis statistics. Generally, item statistics will be somewhat unstable for small groups of students. Perhaps fifty students might be considered a minimum number if item statistics are to be stable. Note that for a group of fifty students, the upper and lower groups would contain only thirteen students each. The stability of item analysis results will improve as the group of students is increased to one hundred or more. An item analysis for very small groups must not be considered a stable indication of the performance of a set of items. [ Top ] Summary Data The item analysis data are summarized on the last page of the item analysis report. The distribution of item difficulty indices is a tabulation showing the number and percentage of items whose difficulties are in each of ten categories, ranging from a very easy category (00-10) to a very difficult category (91-100). The distribution of discrimination indices is tabulated in the same manner, except that a category is included for negatively discriminating items.
4. 4. The mean item difficulty is determined by adding all of the item difficulty indices and dividing the total by the number of items. The mean item discrimination is determined in a similar manner. Test reliability, estimated by the Kuder-Richardson formula number 20, is given. If the test is speeded, that is, if some of the students did not have time to consider each test item, the reliability estimate may be spuriously high. The final test statistic is the standard error of measurement. This statistic is a common device for interpreting the absolute accuracy of the test scores. The size of the standard error of measurement depends on the standard deviation of the test scores as well as on the estimated reliability of the test. Occasionally, a test writer may wish to omit certain items from the analysis although these items were included in the test as it was administered. Such items may be omitted by leaving them blank on the test key. The response patterns for omitted items will be shown but the keyed options will be listed as OMIT. The statistics for these items will be omitted from the Summary Data. [ Top ] Report Options A number of report options are available for item analysis data. The long-form item analysis report contains three items per page. A standard-form item analysis report is available where data on each item is summarized on one line. A sample reprot is shown below. Item Key 1 4 2 2 ITEM ANALYSIS Test 4482 125 Items 112 Students Percentages: Upper 27% - Middle - Lower 27% 1 2 3 4 5 Omit Error Diff Disc 7-23-57 0- 4- 7 28- 8-36 64-62- 0 0-0-0 0-0-0 0-0-0 54 64 7-12- 7 64-42-29 14- 4-21 14-42-36 0-0-0 0-0-0 0-0-0 56 35 The standard form shows the item number, key (number of the correct option), the percentage of the upper, middle, and lower groups who selected each option, omitted the item or erred, the index of difficulty, and the index of discrimination. For example, in item 1 above, option 4 was the correct answer and it was selected by 64% of the upper group, 62% of the middle group and 0% of the lower group. The index of difficulty, based on the total group, was 54 and the index of discrimination was 64. [ Top ] Item Analysis Guidelines Item analysis is a completely futile process unless the results help instructors improve their classroom practices and item writers improve their tests. Let us suggest a number of points of departure in the application of item analysis data. 1. Item analysis gives necessary but not sufficient information concerning the appropriateness of an item as a measure of intended outcomes of instruction. An item may perform beautifully with respect to item analysis statistics and yet be quite irrelevant to the instruction whose results it was intended to measure. A most common error is to teach for behavioral objectives such as analysis of data or situations, ability to discover trends, ability to infer meaning, etc., and then to construct an objective test measuring mainly recognition of facts. Clearly, the objectives of instruction must be kept in mind when selecting test items. 2. An item must be of appropriate difficulty for the students to whom it is administered. If possible, items should have indices of difficulty no less than 20 and no greater than 80. lt
7. 7. Figure 2ILLUSTRATION BY GGS INFORMATION SERVICES. CENGAGE LEARNING, GALE. low performing group (p = .40), thus resulting in an index of .52 (i.e., .92 - .40 = .52). Next, Item 2 is not difficult enough with a discriminability index of only .04, meaning this particular item was not useful in discriminating between the high and low scoring individuals. Finally, Item 3 is in need of revision or discarding as it discriminates negatively, meaning low performing group members actually obtained the correct keyed answer more often than high performing group members. Another way to determine the discriminability of an item is to determine the correlation coefficient between performance on an item and performance on a test, or the tendency of students selecting the correct answer to have high overall scores. This coefficient is reported as the item discrimination coefficient, or the point-biserial correlation between item score (usually scored right or wrong) and total test score. This coefficient should be positive, indicating that students answering correctly tend to have higher overall scores or that students answering incorrectly tend to have lower overall scores. Also, the higher the magnitude, the better the item discriminates. The point-biserial correlation can be computed with procedures outlined in Figure 2. In Figure 2, the point-biserial correlation between item score and total score is evaluated similarly to the extreme group discrimination index. If the resulting value is negative or low, the item should be revised or discarded. The closer the value is to 1.0, the stronger the item's discrimination power; the closer the value is to 0, Figure 3ILLUSTRATION BY GGS INFORMATION SERVICES. CENGAGE LEARNING, GALE. the weaker the power. Items that are very easy and answered correctly by the majority of respondents will have poor pointbiserial correlations. CHARACTERISTIC CURVE A third parameter used to conduct item analysis is known as the item characteristic curve (ICC). This is a graphical or pictorial depiction of the characteristics of a particular item, or taken collectively, can be representative of the entire test. In the item characteristic curve the total test score is represented on the horizontal axis and the proportion of test takers passing the item within that range of test scores is scaled along the vertical axis.
8. 8. For Figure 3, three separate item characteristic curves are shown. Line A is considered a flat curve and indicates that test takers at all score levels were equally likely to get the item correct. This item was therefore not a useful discriminating item. Line B demonstrates a troublesome item as it gradually rises and then drops for those scoring highest on the overall test. Though this is unusual, it can sometimes result from those who studied most having ruled out the answer that was keyed as correct. Finally, Line C shows the item characteristic curve for a good test item. The gradual and consistent positive slope shows that the proportion of people passing the item gradually increases as test scores increase. Though it is not depicted here, if an ICC was seen in the shape of a backward S, negative item discrimination would be evident, meaning that those who scored lowest were most likely to endorse a correct response on the item. Eight Simple Steps to Item Analysis 1. Score each answer sheet, write score total on the corner o obviously have to do this anyway 2. Sort the pile into rank order from top to bottom score (1 minute, 30 seconds tops) 3. If normal class of 30 students, divide class in half o same number in top and bottom group: o toss middle paper if odd number (put aside) 4. Take 'top' pile, count number of students who responded to each alternative o fast way is simply to sort piles into "A", "B", "C", "D" // or true/false or type of error you get for short answer, fill-in-the-blank OR set up on spread sheet if you're familiar with computers ITEM ANALYSIS FORM TEACHER CONSTRUCTED TESTS CLASS SIZE = 30 ITEM UPPER 1. A *B C D O LOWER DIFFERENCE D TOTAL DIFFICULTY 0 4 1 1 *=Keyed Answer o repeat for lower group ITEM ANALYSIS FORM TEACHER CONSTRUCTED TESTS CLASS SIZE = 30 ITEM UPPER 1. A *B C D O 0 4 1 1 LOWER DIFFERENCE D TOTAL DIFFICULTY 2 *=Keyed Answer o this is the time consuming part --> but not that bad, can do it while watching TV, because you're just sorting piles
9. 9. THREE POSSIBLE SHORT CUTS HERE (STEP 4) (A) If you have a large sample of around 100 or more, you can cut down the sample you work with o take top 27% (27 out of 100); bottom 27% (so only dealing with 54, not all 100) o put middle 46 aside for the moment  o larger the sample, more accurate, but have to trade off against labour; using top 1/3 or so is probably good enough by the time you get to 100; --27% magic figure statisticians tell us to use I'd use halves at 30, but you could just use a sample of top 10 and bottom 10 if you're pressed for time   o but it means a single student changes stats by 10% trading off speed for accuracy... but I'd rather have you doing ten and ten than nothing (B) Second short cut, if you have access to photocopier (budgets) o photocopy answer sheets, cut off identifying info (can't use if handwriting is distinctive) o o o o o colour code high and low groups --> dab of marker pen color distribute randomly to students in your class so they don't know whose answer sheet they have get them to raise their hands  for #6, how many have "A" on blue sheet? how many have "B"; how many "C"  for #6, how many have "A" on red sheet.... some reservations because they can screw you up if they don't take it seriously another version of this would be to hire kid who cuts your lawn to do the counting, provided you've removed all identifying information  I actually did this for a bunch of teachers at one high school in Edmonton when I was in university for pocket money (C) Third shortcut, IF you can't use separate answer sheet, sometimes faster to type than to sort SAMPLE OF TYPING FORMAT FOR ITEM ANALYSIS ITEM # KEY 1 2 3 4 5 6 7 8 9 10 T F T F T A D C A B STUDENT Kay Jane o T T T F T A D C A D John o T T T F F A D D A C F F T F T A D C A B type name; then T or F, or A,B,C,D == all left hand on typewriter, leaving right hand free to turn pages (from Sax) IF you have a computer program -- some kicking around -- will give you all stats you need, plus bunches more you don't-- automatically after this stage
10. 10. OVERHEAD: SAMPLE ITEM ANALYSIS FOR CLASS OF 30 (PAGE #1) (in text) 5. Subtract the number of students in lower group who got question right from number of high group students who got it right o quite possible to get a negative number ITEM ANALYSIS FORM TEACHER CONSTRUCTED TESTS CLASS SIZE = 30 ITEM UPPER 1. A *B C D O 0 4 1 1 LOWER 2 DIFFERENCE D TOTAL DIFFICULTY 2 *=Keyed Answer 6. Divide the difference by number of students in upper or lower group o in this case, divide by 15 o this gives you the "discrimination index" (D) ITEM ANALYSIS FORM TEACHER CONSTRUCTED TESTS CLASS SIZE = 30 ITEM UPPER 1. A *B C D O 0 4 1 1 LOWER 2 2 DIFFERENCE D TOTAL DIFFICULTY 0.333 *=Keyed Answer 7. Total number who got it right ITEM ANALYSIS FORM TEACHER CONSTRUCTED TESTS CLASS SIZE = 30 ITEM UPPER 1. A *B C D O 0 4 1 1 LOWER 2 2 DIFFERENCE 0.333 D TOTAL DIFFICULTY 6 *=Keyed Answer 8. If you have a large class and were only using the 1/3 sample for top and bottom groups, then you have to NOW count number of middle group who got each question right (not each alternative this time, just right answers) 9. Sample Form Class Size= 100. o if class of 30, upper and lower half, no other column here 10. Divide total by total number of students o difficulty = (proportion who got it right (p) ) ITEM ANALYSIS FORM TEACHER CONSTRUCTED TESTS CLASS SIZE = 30 ITEM UPPER LOWER DIFFERENCE D TOTAL DIFFICULTY
11. 11. 1. A *B C D O 0 4 1 1 2 2 0.333 6 .42 *=Keyed Answer 11. You will NOTE the complete lack of complicated statistics --> counting, adding, dividing --> no tricky formulas required for this o not going to worry about corrected point biserials etc. o one of the advantages of using fixed number of alternatives Interpreting Item Analysis Let's look at what we have and see what we can see 90% of item analysis is just common sense... 1. Potential Miskey 2. Identifying Ambiguous Items 3. EqualDistribution to all alternatives. 4. Alternatives are not working 5. Distracter too atractive. 6. Question not discriminating. 7. Negative discrimination. 8. Too Easy. 9. Omit. 10. &11. Relationship between D index and Difficulty (p). Item Analysis of Computer Printouts o . 1. What do we see looking at this first one? [Potential Miskey] Upper 1. *A B C D O o o o o Low Difference D Total 1 4 -3 -.2 5 1 3 10 5 3 3 <----means omit or no answer Difficulty .17 #1, more high group students chose C than A, even though A is supposedly the correct answer more low group students chose A than high group so got negative discrimination; only .16% of class got it right most likely you just wrote the wrong answer key down --> this is an easy and very common mistake for you to make better you find out now before you hand back then when kids complain OR WORSE, they don't complain, and teach themselves that your miskey as the "correct" answer so check it out and rescore that question on all the papers before handing them back Makes it 10-5 Difference = 5; D=.34; Total = 15; difficulty=.50 --> nice item   o o