Your SlideShare is downloading. ×
0
Exam and item development
Exam and item development
Exam and item development
Exam and item development
Exam and item development
Exam and item development
Exam and item development
Exam and item development
Exam and item development
Exam and item development
Exam and item development
Exam and item development
Exam and item development
Exam and item development
Exam and item development
Exam and item development
Exam and item development
Exam and item development
Exam and item development
Exam and item development
Exam and item development
Exam and item development
Exam and item development
Exam and item development
Exam and item development
Exam and item development
Exam and item development
Exam and item development
Exam and item development
Exam and item development
Upcoming SlideShare
Loading in...5
×

Thanks for flagging this SlideShare!

Oops! An error has occurred.

×
Saving this for later? Get the SlideShare app to save on your phone or tablet. Read anywhere, anytime – even offline.
Text the download link to your phone
Standard text messaging rates apply

Exam and item development

660

Published on

How to develop a quality exam by constructing sound exam items, using higher taxonomy of items, and reviewing item analysis

How to develop a quality exam by constructing sound exam items, using higher taxonomy of items, and reviewing item analysis

Published in: Education, Technology
0 Comments
1 Like
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total Views
660
On Slideshare
0
From Embeds
0
Number of Embeds
2
Actions
Shares
0
Downloads
0
Comments
0
Likes
1
Embeds 0
No embeds

Report content
Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
No notes for slide
  • Those candidates that meet or exceed the standard of accepted ability based on their performance on the examination will be certified. We want to accurately measure the candidates ability in the field of practice based on the knowledge and skills represented in the exam.
  • Content areas, skill areas, and item taxonomy are three consideration in developing an exam. The classification of items assists the examination developer in monitoring the distribution of items across content and task domains as well as cognitive skill levels. Each item in the examination data base is assigned a classification code which is also used in the item selection. Two dimensional model of competence Dimension I: Content Areas General subjects that represent the disciplines of field experience and imply expertise For Example: Physics, Imaging, Safety, and Physiology Content areas are general subjects that represent the disciplines of field experience and imply expertise. Content reflects the major subject category of the item. It is the content classification that is used in selecting items to insure that the entire content domain is covered. It is, in part, developed from the job/task analysis developed by “the experts” in the field of practice and administered to practioners in the field. Dimension II: Tasks or Skill Areas Actions necessary for effective and efficient job performance For example: Identify, Analyze, Calculate, Evaluate…. Apply knowledge of…, Select appropriate…, Prepare appropriate…, Develop, Establish, Calculate, Analyze, Identify, Synthesize, Evaluate, Implement, Perform, and Use Instrumentation Tasks or Skill Areas are. Task is the skill performed (e.g. diagnosis or treatment). By structuring items to reflect different tasks, a greater variety of items are generated. Taxonomy levels relates to the cognitive skill component in criterion-referenced testing. It is the level of mental process used by the candidate to determine the correct response to a item. Taxonomy refers to the cognitive processes required to answer the item. The construction of the stem and responses, utilization of visual materials as well as the process and content of the item all contribute to the classification of an item by taxonomy level. There are three taxonomy levels that must be considered in item writing: recall, interpretation, and problem solving. Exam Validity and Reliability Reliability Is the examination consistent and dependable? Internal consistency will be generally high because items are written by specialists in the field. Validity Does the test measure what it is supposed to measure? Content guideline or exam blueprint is your basis for content validity. Are we achieving our goals: The goal of the exam development process is to accurately measure the candidate’s ability in the field practice. An examination is created to measure the ability of the candidate based upon the knowledge and skills represented in each test question.
  • Recall Items: test the candidate’s ability to recognize or recall a specific fact or concept. Interpretation Items: require the candidate to interpret information utilizing recall knowledge and are presented in the form of diagnostic images, laboratory data, or patient history. Often interpretation items ask for a decision, such as 1.) diagnosis or 2.) prognosis. Problem Solving: Candidate must use a base of knowledge to interpret data and then solve a problem or make a decision. Also, complications may be added to the situation and need a remedy. Problem solving items often ask for the best management or treatment options for the patient.
  • Item writing is an arduous task requiring not only mastery of the subject matter, but also an understanding of the examination population and mastery of verbal communication skills. The review process insures that the item adheres to appropriate technical and/or scientific principles (STANDARDS, 1985). Items are selected by a group of experts for inclusion in the examination data bases. Responsibilities of item writer include, but are not limited to the following: Developing new items on a continuing basis, as assigned. Reviewing and selecting items for inclusion in the written examination. Monitoring the content, task, and cognitive skill distributions of items. Monitoring the content quality and difficulty of each item and avoiding duplicate items on the same knowledge/skill. Providing expert input into the criterion standard against which candidates are measured. Reviewing the performance of each item to ascertain the quality of the content and structure of the item. The goal is to maintain a pool of examination items which are appropriate to measure the knowledge and skills necessary for safe and effective performance in the field of practice.
  • A multiple choice item is designed for objective measurement and contains a STEM and four RESPONSES, one of which is the best answer. The multiple choice item is unique in that the standard by which the best answer is selected is contained in the stem. Also, the best answer does NOT have to be the one and only indisputably correct response to the item, as long as the subject matter expert agree it is the best answer of those presented. The form is flexible so that items may be based on items, situations, laboratory results, etc. The following sections outline techniques for writing and evaluating multiple choice items by considering first the stem and then the responses .
  • Stem The stem of a multiple choice item may: asks a question gives an incomplete statement states an issue describes a situation or any combination of the above.
  • Responses : 1. The "BEST" answer is the response the author and other experts consider the most appropriate answer. 2. The "DISTRACTORS" are logical misconceptions of the best answer.
  • The content of the stem focuses on a central theme or problem, using clear and precise language, without excessive length which can confuse or distract candidates. The stem may ask a straight forward question, present a scenario or describe data or laboratory results. The question or issue presented in the stem should be relevant to the knowledge and skill level of the population being evaluated. Each multiple choice item should have four mutually exclusive responses.
  • Sentence structure in the stem should be grammatically accurate and logically related to the responses. It should present all relevant information to insure clarity and understanding. Although the multiple choice item format is brief, sufficient information to make an interpretation, answer the question, or solve a problem must be included. Avoid superfluous information, but be certain that all necessary details are included. Also avoid the use of personal pronouns such as "you" which are inappropriate and perhaps confusing. As a general principle, the stem should be stated in a POSITIVE form. Negative statements are not characteristic of normal thought processes, and consequently may place the candidate who is attempting to decipher the item at a disadvantage.
  • The plausibility of the responses is the first consideration. The best answer should be the one agreed upon by the experts; however, the other three distractors should also seem plausible to the candidates who have partial, incomplete or inappropriate knowledge. The distractors may therefore be considered logical misconceptions of the best answer. The responses should be parallel in content length, and category of information. The grammatical structure of all the responses should be a logical conclusion to the situation, question, or statement presented in the stem. When writing distractors, it is wise to avoid the use of superlatives such as "always" and "never". Such words lead candidates from the response as they tend to be associated with suspect or exaggerated statements. Repetitive language within the responses should be avoided. Words which are repeated in every response may be placed in the stem. Thus, the candidate has less to read and is less likely to be confused by the structure. The length of each response should be approximately the same. There is a tendency among item writers to make the best answer the longest answer. Testwise candidates may key to this fact and answer correctly because of the format of the response. Each distractor should be mutually exclusive and not overlapping. For instance, if a series of percentages is to be used for the responses, each range must be unique to the response. The following example illustrates this: a. 10 - 20 b. 30 - 50 c. 55 - 60 d. 65 - 75 e. 76 - 100 If responses are overlapping, the candidate may not be able to determine the best answer not because they do not know the answer, but because the answer is incorporated into more than one response. In addition, the candidate may be able to argue that more than one response is correct due to the overlap. Avoid using Anone of the above@ as a response. This response does not test what the candidate knows, but only that he/she can recognize that the correct answer is not present. For example: What is the capital of Texas? 1. Kansas City 2. Pasadena 3. New York 4. None of the above(the candidate confidently selects >none of the above= because he/she thinks he/she knows that the capital of Texas is Lubbock) Avoid using all of the above as a response. Essentially, this is an overlapping response, because it requires the candidate to consider the responses in combination. Knowing that two are correct leads a clever candidate to all of the above without knowing the importance or correctness of the remaining responses.
  • Item evaluations can be performed for items which have been previously tested and for which statistics have been received. The purpose of the item evaluation is to identify items that are not measuring as expected. Items that fail to perform properly increase the error of the exam and therefore do not contribute to the precision of the pass/fail decision made about candidates. After items are presented on a test, they are subjected to statistical, as well as, content analysis. The statistical analysis provides clues for the subject matter experts with regard to how well the content of the item yielded useful information about candidate ability. The purpose of deleting items from an examination is always to create more precise and fair examinations. Any item that performs poorly is flagged for possible deletion. Items may perform poorly for many reasons. Many of these reasons are related to the initial construction of the item stem and responses. Proper development of item stems and responses leads to a higher probability that the item will perform successfully. Traditional Item Analysis In the process of item review, the item statistics represent the performance of the item and provide guidance to the examination reviewers when revising items. Traditional item analysis consists of a p-value and point biserial correlation (RPBI). The p-value is the percent of candidates who selected each response. Hopefully, more candidates selected the keyed correct response than any distractor. The point biserial correlation is the correlation between the performance of the candidates who answered the item correctly and the candidates who did well on the total test. The point biserial correlation should be positive and higher for the keyed correct response and negative for the distractors. This pattern suggests that candidates who did well on the test tended to select the correct answer on the item. The ideal ranges for the item statistics are as follows: p-value: generally in the range of .30 to .80 RPBI: around .20 for the correct answer and negative for all distractors
  • Transcript

    • 1. Exam and Item Development
    • 2. Examination’s Purpose The goal of the exam development process is to accurately measure the candidate’s ability in the field practice. An examination is created to measure the ability of the candidate based upon the knowledge and skills represented in each test question. An examination is NOT created to measure the ability of the candidate to take an exam. Joosten || 2007
    • 3. Considerations in Criterion-referenced Testing actions necessary for effective and efficient performance Exam Validity Content Areas & Reliability Task/Skill Areas Item Taxonomy Joosten || 2007
    • 4. Item Taxonomy RECALL PROBLEM SOLVING INTERPRETATION Joosten || 2007
    • 5. Sample Recall Item Which of the following describes the active growth phase of the cycle of normal human hair growth? A. Anagen. B. Betagen. C. Catagen. D. Telogen. Joosten || 2007
    • 6. Sample Interpretation Item 23-year-old woman who is acutely febrile has had an untreated, painful lower left third molar for 3 weeks. The patient can open her mouth only 8mm, has some pain on swallowing, and has moderate swelling just beneath the angle of the mandible on the left side. The diagnosis most likely is an abscess in which of the following spaces? A. lateral pharyngeal. B. retropharyngeal. C. submandibular. D. masticatory. Joosten || 2007
    • 7. Sample Problem Solving Item A periapical roetgenogram reveals an impacted lower third molar in an edentulous mandible. The patient is experiencing recurrent acute and chronic infection of the overlying soft tissue denture base. For definitive treatment, the surgeon should: A. reline and relieve the denture base. B. remove the tooth using appropriate antibiotic control. C. trim the swollen tissue and prescribe antibiotics. D. advise the patient to remove the denture when eating. Joosten || 2007
    • 8. Developing Multiple Choice Items Issues and Methods
    • 9. Multiple Choice Items GOAL: Maintain a pool of exam items which are appropriate to measure the knowledge and skills necessary for safe and effective performance in the field of practice. Item construction affects the performance of your exam. A multiple choice item is a specific form of item that is composed of a stem and options Parts of an item: Stem Distractors Correct answer Joosten || 2007
    • 10. Stem The stem of a multiple choice item may: ask a question Which of the following microscopic subtypes of ameloblastoma is most common? give an incomplete statement The most common microscopic subtype of ameloblastoma is:The stem of a multiple choice item may: describe a situation (along with a question or incomplete statement) A 25 year-old man is brought to the emergency room. He was found lying unconscious on the sidewalk. After ascertaining that the airway is open, the next step in management Joosten || 2007 be: should
    • 11. Item Response Options Options are all the possible answers for a stem. One correct (best) answer Three distractors The best answer is agreed upon by experts. The distractors are logical misconceptions of the best answer. Joosten || 2007
    • 12. Developing Items Items should have one best answer. Avoid items based on opinion or for which there is not an accepted answer. Items must focus on a single issue, fact, or problem in each item. Items should test important and pertinent material while avoiding trivial facts. Items should be developed utilizing good grammar, punctuation, and spelling. Attempt to write interpretation and problem solving items. Use a standard number of responses. Options should avoid “all of the above” and “none of the above.” Joosten || 2007
    • 13. Stem Construction Stems should: Avoid over specific knowledge, excess information, and teaching in the stem. Include the central idea and most verbiage in the stem. Be stated positively and avoid negative phrasing. Avoid personal pronouns (i.e., you). Use terminology common to practice and avoid textbook verbatim phrasing. Avoid superlatives such as “always” and “never.” Joosten || 2007
    • 14. Responses Construction Responses should be: Organized in a logical order Independent and not overlapping Fairly consistent in length Homogeneous Plausible Joosten || 2007
    • 15. Any Questions? Joosten || 2007
    • 16. Item Evaluation P-value: percent of candidates who selected a response. Point Biserial Correlation: correlation between those candidates who did well on the test and those candidates who selected the correct response. Joosten || 2007
    • 17. Good Item1ST # * 2ND# * 3RD # * 4TH # * 5TH # * # #-----#-----#-----#-----#-----#-----#-----#-----#-----#-----# 0 10 20 30 40 50 60 70 80 90 100A IS THE CORRECT ANSWER A B C DP-VALUE 0.70 0.15 0.05 0.01 Joosten || 2007
    • 18. Good P-value:Poor Discrimination 1ST # * 2ND # * 3RD # * 4TH # * 5TH # * #-----#---------#-----#-----#-----#-----#-----#-----#-----#-----# 0 10 20 30 40 50 60 70 80 90 100 C IS THE CORRECT ANSWER A B C D P-VALUE 0.05 0. 07 0.73 0.15 RPBI 0.11 -0. 10 0.02 -0.02 || 2007 Joosten
    • 19. Low P-value:Low Discrimination 1ST # * 2ND # * 3RD # * 4TH # * 5TH # * #-----#-----#-----#-----#-----#-----#-----#-----#-----#-----# 0 10 20 30 40 50 60 70 80 90 100 A IS THE CORRECT ANSWER A B C D P-VALUE 0. 47 0. 33 0.15 0.05 RPBI 0. 08 -0. 13 0.01 Joosten || 2007 0.09
    • 20. Evaluating Item Stems 1. Focus on a single issue, fact, or problem in each item. 2. Avoid over specific knowledge. 3. Avoid textbook verbatim phrasing for items. 4. Avoid items based on opinion. 5. Avoid items for which there is not an accepted answer. Joosten || 2007
    • 21. Evaluating Item Stems 6. Test important material, while avoiding trivial facts. 7. State the item positively and avoid negative phrasing. 8. Include the central idea and most verbiage in the stem. 9. Use one best answer format. 10. Use good grammar, punctuation, and spelling. Joosten || 2007
    • 22. Evaluating Item Stems 11. Avoid excess information in the stem as well as teaching in the stem. 12. Avoid personal pronouns (i.e., you). 13. Attempt to write stems that require interpretation and problem solving from the candidate (rather than recall). Joosten || 2007
    • 23. Anatomy of Item Responses Item responses should consist of: 1.) the best answer (agreed upon by experts). 2.) logical misconceptions of the best answer or distractors. Joosten || 2007
    • 24. Evaluating Item Responses 1. Use a standard number of responses. 2. Place options in a logical order. 3. Keep options independent and not overlapping. 4. Keep options homogeneous in content. 5. Keep the length of the options fairly consistent. Joosten || 2007
    • 25. Evaluating Item Responses 6. Be sure all distractors are plausible. 7. Be sure all distractors are logical misconceptions. 8. Avoid “all of the above” and “none of the above.” 9. Phrase options positively, not negatively. 10. Avoid use of slang. Joosten || 2007
    • 26. Evaluating Item Responses 11. Avoid absurd or “fantastic” options. 12. Avoid giving clues through faulty grammar. 13. Make sure there is only one best answer. 14. Avoid superlatives such as “always” and “never.” 15. Evenly distribute position of the correct answer. Joosten || 2007
    • 27. General Considerations •Does the item deal with trivial content? •Is the answer discrimination too fine? •Does the item stem includes unrelated information? •Is there more than one correct answer? •Is the item highly ambiguous? •Is the question so obvious that the best answer appeared to be the only plausible choice? •Are some distractors ‘tip-offs’ because of the choice of words or phrasing in the responses or stems? •Are all of the distractors parallel? •Are the responses of comparable plausibility? Joosten || 2007
    • 28. In Summary The goal of item writing or editing is to create items that will measure the skills and abilities of the candidates. To do that the items must be clear, concise, accurate and be of sound structure and of pertinent content. Joosten || 2007
    • 29. Review Item Statistics P-value – percent of candidates who selected a response Point Biserial Correlation – correlation between those candidates who did well on the test and those candidates who selected the correct response: positive – correct answer negative - distractors Joosten || 2007
    • 30. Good Item1ST # * 2ND# * 3RD # * 4TH # * 5TH # * # #-----#-----#-----#-----#-----#-----#-----#-----#-----#-----# 0 10 20 30 40 50 60 70 80 90 100A IS THE CORRECT ANSWER A B C DP-VALUE 0.70 0.15 0.05 0.01 Joosten || 2007

    ×