Item and Distracter Analysis
Upcoming SlideShare
Loading in...5
×

Like this? Share it with your network

Share
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Be the first to comment
No Downloads

Views

Total Views
20,199
On Slideshare
20,198
From Embeds
1
Number of Embeds
1

Actions

Shares
Downloads
914
Comments
0
Likes
5

Embeds 1

http://wiki.vantage.com 1

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
    No notes for slide
  • http://free-personality-tests.com/wp-content/uploads/image/test.jpg
  • -use before a test is administered helps assure that the test is a reliable and valid measure -use after the test has been administered makes it possible to build a bank of high-quality test items for use during future years -use after allows the teacher to remove errant questions that adversely affect the quality of classroom measures in the current year
  • -use before a test is administered helps assure that the test is a reliable and valid measure -use after the test has been administered makes it possible to build a bank of high-quality test items for use during future years -use after allows the teacher to remove errant questions that adversely affect the quality of classroom measures in the current year
  • #2knowledge of which incorrect options students select must be a clue to the nature of the misunderstanding, and thus prescriptive of appropriate remediation
  • #2knowledge of which incorrect options students select must be a clue to the nature of the misunderstanding, and thus prescriptive of appropriate remediation
  • #2knowledge of which incorrect options students select must be a clue to the nature of the misunderstanding, and thus prescriptive of appropriate remediation
  • Item Difficulty -easiest -each item should have been correctly answered by a little over half of the class
  • p=0.88
  • p=0.88
  • most scores clustered on high side unusual low scores spread out over the wide area of the distribution
  • Most children will score very high on these measures. Children with learning problems will be easily identified by their pattern of lower scores.
  • This test would be very difficult for most applicants.
  • Most common formula: right – (wrong/n-1) where n is the number of alternatives for an item
  • The more difficult the likelier students will guess an answer to the question.
  • p=0.88
  • p=0.88
  • -dependent on teacher’s grading philosophy or goal for testing -if well-constructed test designed w/ items of this difficulty level is administered, student scores should distribute as a normal Gaussian curve; distribution will have a mean score equal to the average difficulty level of test items -if to maximize differences between students, optimal diff. index is 0.50
  • problem for teachers who assign grades according to a rigid scale e.g. 60-69% range would be a D for this reason teachers often design test items to have difficulty indexes over 0.70
  • A is chosen by only 2; may be too obviously wrong. C is working well. B has drawn far too many students; should be examined to determine what caused this choice. Teacher should consider reteaching the topic and explaining why B is wrong. A and B need to be rewritten for the next edition of the test.
  • Valid: students who have higher scores know more about the subject than students with lower scores.
  • #2 Split the group at the median, top 50% and bottom 50%. If several students are at the median, allocate them randomly to one or the other group to get balanced groups.
  • p=0.70 D=.90-.50=.40
  • p=0.50 D=-0.20
  • #2 Scores in the middle will not be used #4 Percentage of students who got the item right #5 difference between number of pupils in the upper and lower groups who got the item right
  • p=0.70 D=.90-.50=.40
  • -use before a test is administered helps assure that the test is a reliable and valid measure -use after the test has been administered makes it possible to build a bank of high-quality test items for use during future years -use after allows the teacher to remove errant questions that adversely affect the quality of classroom measures in the current year
  • To answer this a pre-test and post-test must be given, and the results compared. An item by item comparison can be made by means of an item-response chart.

Transcript

  • 1. Sue Quirante EDRE146
  • 2. Report Outline
    • Item Analysis
      • Item Difficulty Index
        • Diagnostic Testing
        • Out-of-Level Testing
        • Distracter Analysis
      • Item Discrimination Level/Index
        • Hogan (2007)
        • Biserial-Point Correlation
        • Gronlund & Linn (1990)
      • Criterion-Referenced Test Analysis
  • 3. Item Analysis the effort to improve individual questions after they are used process of examining answers to questions in order to assess the quality of individual test items and test itself  
  • 4. Item Analysis the effort to improve individual questions after they are used process of examining answers to questions in order to assess the quality of those items and of the test  
  • 5. Item Analysis
    • items to be analyzed must be valid measures of instructional objectives
    • items must be diagnostic
    • selecting and rewriting items on the basis of item performance data improves effectiveness of items and improves validity of scores
  • 6. Item Analysis
    • items to be analyzed must be valid measures of instructional objectives
    • items must be diagnostic
    • selecting and rewriting items on the basis of item performance data improves effectiveness of items and improves validity of scores
  • 7. Item Analysis
    • items to be analyzed must be valid measures of instructional objectives
    • items must be diagnostic
    • selecting and rewriting items on the basis of item performance data improves effectiveness of items and improves validity of scores
  • 8. Purpose
    • improves items used again in later tests
    • eliminates ambiguous or misleading items in a single test administration
    • increases instructors' skills in test construction
    • identify specific areas of course content which need greater emphasis or clarity
  • 9. Purpose
    • improves items used again in later tests
    • eliminates ambiguous or misleading items in a single test administration
    • increases instructors' skills in test construction
    • identify specific areas of course content which need greater emphasis or clarity
  • 10. Purpose
    • improves items used again in later tests
    • eliminates ambiguous or misleading items in a single test administration
    • increases instructors' skills in test construction
    • identify specific areas of course content which need greater emphasis or clarity
  • 11. Purpose
    • improves items used again in later tests
    • eliminates ambiguous or misleading items in a single test administration
    • increases instructors' skills in test construction
    • identify specific areas of course content which need greater emphasis or clarity
  • 12. Before Item Analysis
    • Editorial Review
      • 1 st : a few hours after the first draft was written
      • 2 nd : involve one or more other teachers, esp. those w/ content knowledge of the field
  • 13. Before Item Analysis
    • Editorial Review
      • 1 st : a few hours after the first draft was written
      • 2 nd : involve one or more other teachers, esp. those w/ content knowledge of the field
  • 14. Item Analysis
    • 1. Item Difficulty Index
    • > percentage of students who answered a test item correctly
    • >p-value
  • 15. Difficulty Index
    • Occasionally everyone knows the answer
      • An unusual high level of success may be due to:
      • previous teacher
      • knowledge from home; child’s background
      • excellent teaching
      • poorly constructed, easily guessed
  • 16. Difficulty Index
    • Low scores
      • Is it the student’s fault for “not trying”?
      • Motivation level
      • Ability of teacher to get a point across
      • Construction of the test item
  • 17. Difficulty Index p = total who answered correctly total taking the test *p is the difficulty index
  • 18. Difficulty Index p = 22 get the correct answer 25 students take the test *p is the difficulty index p = ?
  • 19. Difficulty Index p = 22 get the correct answer 25 students take the test *p is the difficulty index p = 0.88
  • 20. Difficulty Index p = 0.88 *p is the difficulty index 88% of the students got it right high difficulty index
  • 21. Difficulty Index p = 0.88 *p is the difficulty index
    • item was too easy
    • students were well taught
  • 22. Difficulty Index
      • Sample Problem:
      • In a Math test administered by Mr. Reyes, seven students answered word problem #1 correctly. A total of twenty-five students took the test.
      • What is the difficulty index for word problem #1?
  • 23. Difficulty Index
      • p = 0.28
  • 24. Difficulty Index
      • p = 0.28
      • low difficulty index
      • low difficulty level at p < 0.30
  • 25. Difficulty Index
      • p = 0.28
      • students didn’t understand the concept being tested
      • item could be badly constructed
  • 26. Distribution with Negative Skew picture from http://billkosloskymd.typepad.com
  • 27. Distribution with Negative Skew picture from http://billkosloskymd.typepad.com
      • p > 0.70
      • Useful in identifying students who are experiencing difficulty in learning the material.
  • 28. Distribution with Negative Skew picture from http://billkosloskymd.typepad.com
      • Diagnostic Testing
      • Used to identify learning problems experienced by a child.
  • 29. Distribution with Negative Skew picture from http://billkosloskymd.typepad.com
      • Diagnostic Testing
      • Made up of easy test items that cover core skill areas of a subject.
  • 30. Distribution with Positive Skew picture from http://www.ken-szulczyk.com/lessons/statistics/asymmetric_distribution_01.png
  • 31. Distribution with Positive Skew picture from http://www.ken-szulczyk.com/lessons/statistics/asymmetric_distribution_01.png
      • Out-of-Level Testing
      • Used to select the very best top students for special programs.
  • 32. Distribution with Positive Skew picture from http://www.ken-szulczyk.com/lessons/statistics/asymmetric_distribution_01.png
      • Out-of-Level Testing
      • The optimal level of item difficulty is the selection ratio .
  • 33. Out-of-Level Testing picture from http://www.ken-szulczyk.com/lessons/statistics/asymmetric_distribution_01.png
      • selection ratio:
      • number that will be selected
      • number of applicants
  • 34. Out-of-Level Testing picture from http://www.ken-szulczyk.com/lessons/statistics/asymmetric_distribution_01.png
      • Sample Problem:
      • A university summer program for junior high school students limits admission to 40 slots. 200 students are nominated by their high schools.
      • What should be the average difficulty level of the program’s admission test?
  • 35. Out-of-Level Testing picture from http://www.ken-szulczyk.com/lessons/statistics/asymmetric_distribution_01.png
      • 40 admission slots
      • 200 test takers
      • The test should have an average difficulty level of 0.20 (low difficulty index).
  • 36. Out-of-Level Testing picture from http://www.ken-szulczyk.com/lessons/statistics/asymmetric_distribution_01.png The average score would be only 20% plus a bit more for the guessing factor. The distribution of scores would show a positive skew. The best students would be evident at the upper end of the distribution.
  • 37. Out-of-Level Testing picture from http://www.ken-szulczyk.com/lessons/statistics/asymmetric_distribution_01.png The average score would be only 20% plus a bit more for the guessing factor. The distribution of scores would show a positive skew. The best students would be evident at the upper end of the distribution.
  • 38. Out-of-Level Testing picture from http://www.ken-szulczyk.com/lessons/statistics/asymmetric_distribution_01.png The average score would be only 20% plus a bit more for the guessing factor. The distribution of scores would show a positive skew. The best students would be evident at the upper end of the distribution.
  • 39. Difficulty Index
      • Critical Factor:
      • Guessing
        • Likelihood of guessing the correct answer for a multiple choice question is a function of the number of answer alternatives
  • 40. Difficulty Index
      • Critical Factor:
      • Guessing
        • The chance of guessing correctly out of four alternatives is 1:4 or 25%
        • Lower difficulty index (p < 0.30) has a higher proportion of those answering correctly by guessing
  • 41. Item Analysis with Constructed or Supply-Type Items e.g. essay
  • 42. Difficulty Index p = mean score of the class maximum possible score *p is the difficulty index
  • 43. Difficulty Index
      • Sample Problem:
      • The mean score of a class on an essay is 16.5 out of a total maximum score of 20.
      • What is the difficulty index of the essay?
  • 44. Difficulty Index
      • Sample Problem:
      • p = 16.5
      • 20
      • p = 0.825
  • 45. Optimum Difficulty* *corrected for guessing 0.75 True-False MCQ 3 alternatives 0.67 MCQ 4 alternatives 0.625 MCQ 5 alternatives 0.60 Essay test 0.50
  • 46. Problem When using a rigid grading scale, over half of the students will fail or get a D. picture from http://savingphilippinepupilsandparents.blogspot.com/
  • 47. Distracter Analysis a multiple choice item has a low difficulty index ( p < 0.30 ) examine the item’s distracters 1 2
  • 48. Distracter Analysis a multiple choice item has a low difficulty index ( p < 0.30 ) examine the item’s distracters 1 2
  • 49. Anatomy of a multiple choice item
    • How many inches are in a foot?
    • 12
    • 20
    • 24
    • 100
    keyed or correct option item stem distracters or foils options, alternatives or choices
  • 50. Distracter Analysis p = 15/50 or 0.30 Answer Choices Number who selected choice (out of 50) Choice A 2 Choice B 26 Choice C 7 Choice D 15
  • 51. Item Analysis
    • 2. Item Discrimination Level/Index
    • > extent to which success on an item corresponds to success on the whole test
    • > D
  • 52. Assumptions
    • We assume that the total score on the test is valid.
    • We also assume that each item shows the difference between students who know more from students who know less.
    • The discrimination index tells us the extent that this is true.
  • 53. Item Discrimination
    • Methods:
    • a) Hogan (2007)
    • b) point-biserial correlation ( rpb)
    • c) Gronlund & Linn (1990)
  • 54. Hogan (2007)
  • 55. Hogan’s Method N=40 ; Median=35 ; The test contained 50 items . * Indicates correct option. Item Group A B* C D 5 High 0 90 10 0 Low 10 50 30 10 Total 5 70 20 5 in %
  • 56. Sample Problem N=40 ; Median=35 ; The test contained 50 items . Item Group A* B C D 5 High 40 60 0 0 Low 60 30 0 10 Total 50 45 0 5
  • 57. Interpretation
    • Positive discrimination occurs if more students in the upper group than the students in the lower group get the item right.
    • This indicates that the item is discriminating in the same direction as the total test score.
  • 58. Interpretation
    • Positive discrimination occurs if more students in the upper group than the students in the lower group get the item right.
    • This indicates that the item is discriminating in the same direction as the total test score.
  • 59. Point-Biserial Correlation
    • Used to correlate item scores with the scores of the whole test
    • A special case of the Pearson Product Moment Correlation, where one variable is binary (right vs. wrong), and the other is continuous (total raw test score)
  • 60. Point-Biserial Correlation
    • Used to correlate item scores with the scores of the whole test
    • A special case of the Pearson Product Moment Correlation, where one variable is binary (right vs. wrong), and the other is continuous (total raw test score)
  • 61. Point-Biserial Correlation
  • 62. Point-Biserial Correlation
    • mean raw score of all students who got the item right
    • mean raw score of all students who got the item wrong
    • standard deviation of the raw scores
    • p proportion of students who got the right answer
  • 63. Point-Biserial Correlation
    • A negative point-biserial correlation means that the students who did well on the test missed that item, while those students who did poorly on the test got the item right.
    • This item should be rewritten.
  • 64. Software Support
    • Calculates r pb online:
    • http://faculty.vassar.edu/lowry/pbcorr.html
    • Date accessed: 1 August 2011
    • Software index for free reliability software:
    • http://www.rasch.org/software.htm
  • 65. Gronlund & Linn (1990) For norm-referenced tests:
  • 66. Gronlund & Linn (1990)
    • Item discriminating power
      • degree to w/c a test item discriminates between pupils with high and low scores
      • D = RU – RL
      • ½ T
      • R U is the number of pupils in the upper group who got the item right
      • R L is the number of pupils in the lower group who got the item right
      • T is the total number of pupils included in the analysis
  • 67. Gronlund & Linn (1990)
    • Item discriminating power
      • degree to w/c a test item discriminates between pupils with high and low scores
      • D = RU – RL
      • ½ T
      • R U is the number of pupils in the upper group who got the item right
      • R L is the number of pupils in the lower group who got the item right
      • T is the total number of pupils included in the analysis
  • 68. Interpreting Values
    • D=0.60 indicates average discriminating power.
    • D=1.00 has maximum positive discriminating power where all pupils in the upper group get the item right while all pupils in the lower group get the item wrong.
    • D=0.00 has no discriminating power where an equal number of pupils in the upper and lower groups get the item right.
  • 69. Interpreting Values
    • D=0.60 indicates average discriminating power.
    • D=1.00 has maximum positive discriminating power where all pupils in the upper group get the item right while all pupils in the lower group get the item wrong.
    • D=0.00 has no discriminating power where an equal number of pupils in the upper and lower groups get the item right.
  • 70. Interpreting Values
    • D=0.60 indicates average discriminating power.
    • D=1.00 has maximum positive discriminating power where all pupils in the upper group get the item right while all pupils in the lower group get the item wrong.
    • D=0.00 has no discriminating power where an equal number of pupils in the upper and lower groups get the item right.
  • 71. Sample Problem * Indicates correct option. Find the p and D . Item Group A B* C D 5 Upper 10 0 10 0 0 Lower 10 2 4 1 3
  • 72. Answers p = 0.70 D = 0.90 - 0.50 = 0.40
  • 73. Analysis of Criterion-Referenced Mastery Items
  • 74. Crucial Questions: To what extent did the test items measure the effects of instruction?
  • 75. Item Response Chart + means correct - means incorrect Items 1 2 3 4 5 Pretest (B) Posttest (A) B A B A B A B A B A Jim - + + + - - + - - + Dora - + + + - - + - + + Lois - + + + - - + - - + Diego - + + + - - + - - +
  • 76. Sensitivity to Instructional Effects (S)
    • S = R A – R B
    • T
    • R A is the number of pupils who got the item right after instruction
    • R B is the number of pupils who got the item right before instruction
    • T is the total number of pupils who tried the item both times
  • 77. Sensitivity to Instructional Effects (S)
    • S = 1
    • The ideal item value is 1.00. Effective items fall between 0.00 and 1.00.
    • The higher the positive value, the more sensitive the item is to instructional effects.
    • Items with zero and negative values do not reflect the intended effects of instruction.
  • 78. Sensitivity to Instructional Effects (S)
    • S = 1
    • The ideal item value is 1.00. Effective items fall between 0.00 and 1.00.
    • The higher the positive value, the more sensitive the item is to instructional effects.
    • Items with zero and negative values do not reflect the intended effects of instruction.
  • 79. Sensitivity to Instructional Effects (S)
    • S = 1
    • The ideal item value is 1.00. Effective items fall between 0.00 and 1.00.
    • The higher the positive value, the more sensitive the item is to instructional effects.
    • Items with zero and negative values do not reflect the intended effects of instruction.
  • 80. References
    • Gronlund, N.E. & Linn, R.L. (1990). Measurement and Evaluation in Teaching (6 th ed.). USA: MacMillan Publishing Company.
    • Hogan, Thomas P. (2007). Educational Assessment: A Practical Introduction. USA: John Wiley & Sons, Inc.
    • Michigan State University Board of Trustees. (2009). Introduction to Item Analysis. Retrieved from http://scoring.msu.edu/itanhand.html
    • Wright, Robert J. (2008). Educational Assessment: Tests and Measurements in the Age of Accountability. Calif: Sage Publications, Inc.