Standard Setting In Medical Exams


Published on

Describes the Classification of standards, standard-setting models and the Hofsee method of scaling, as employed in Medical University of Americas, as per USMLE guidelines.

Published in: Education, Technology
No Downloads
Total views
On SlideShare
From Embeds
Number of Embeds
Embeds 0
No embeds

No notes for slide
  • *Construct validity = Convergent and Discriminant; Criterion validity = Concurrent and Predictive **Reliability: Inter-rater reliability, Internal consistency reliability; Test-retest reliability
  • OSCE : Objective Structured Clinical Exam; Common OSCE exam formats including History Taking, Physical Examination, Consult, and Emergency Room Management Stations
  • *Cronbach’s alpha = Internal consistency reliability; Cohen’s kappa = Inter-rater reliability
  • SD = Standard Deviation
  • Standard Setting In Medical Exams

    1. 1. Standard Setting and Medical Students’ Assessment Dr. Sanjoy Sanyal Associate Professor – Neurosciences Medical University of the Americas Nevis, St. Kitts-Nevis, WI [email_address]
    2. 2. List of topics <ul><li>Summative assessment </li></ul><ul><li>Standard-setting </li></ul><ul><li>Classification of standards </li></ul><ul><li>Standard-setting models </li></ul><ul><ul><li>Test-centered models </li></ul></ul><ul><ul><li>Examinee-centered models </li></ul></ul><ul><li>Modified Angoff approach </li></ul><ul><li>The Hofstee method </li></ul><ul><li>Evaluation of standards </li></ul><ul><li>Future perspectives </li></ul><ul><li>Conclusion </li></ul><ul><li>References </li></ul>
    3. 3. What is Summative Assessment? <ul><li>In the context of a Caribbean medical school training students to be future doctors, summative assessment can be interpreted as any of the following: </li></ul><ul><ul><li>End-point / end-semester assessment </li></ul></ul><ul><ul><li>Certification examination </li></ul></ul><ul><ul><li>Licensing examination </li></ul></ul>
    4. 4. Reasons for Post-training Summative Assessment <ul><li>Trainee motivation: Assessment drives learning </li></ul><ul><li>Recognition of achievement </li></ul><ul><li>Rite of passage: Initiation to the profession </li></ul><ul><li>Reputation of the discipline </li></ul><ul><li>Patient safety </li></ul><ul><li>Quality marker for patients </li></ul>
    5. 5. Characteristics of Good Summative Assessment Proportionate Fair Predictive validity Cost-effective Consistent Construct validity Practicable Accurate Content validity FEASIBILITY RELIABILITY** VALIDITY*
    6. 6. Assessment Methods Adapted from Roger Neighbour 2006
    7. 7. Factors Playing a Role in Recruitment of Examiners <ul><li>Qualities </li></ul><ul><li>Credibility </li></ul><ul><li>Can ‘rank order’ </li></ul><ul><li>Trainable </li></ul><ul><li>Impartial </li></ul><ul><li>Team players </li></ul><ul><li>Incentives </li></ul><ul><li>Status </li></ul><ul><li>Influence </li></ul><ul><li>Stimulation </li></ul><ul><li>‘ Make a difference’ </li></ul><ul><li>Financial </li></ul>
    8. 8. Selection of Standard-setting Panelists <ul><li>Experts in related field of examination </li></ul><ul><li>Familiar with examination methods </li></ul><ul><li>Good problem solvers </li></ul><ul><li>Familiar with level of candidates </li></ul><ul><li>Interested in education (teachers) </li></ul>
    9. 9. Establishing Standards – Policy Decision Deciding who should pass or fail should be a matter of policy decision rather than a statistical exercise
    10. 10. Why Need Standard Setting? <ul><li>To provide an educational tool to decide cut-off point on the scoring scale which separates the non-competent from the competent </li></ul><ul><li>To determine standards of performance , which separate competent from the non-competent candidate </li></ul>
    11. 11. Pertinent Questions Regarding Standard Setting <ul><li>What is the main purpose of assessment? </li></ul><ul><li>What is at stake? </li></ul><ul><ul><li>For students </li></ul></ul><ul><ul><li>For patients </li></ul></ul><ul><ul><li>For organization </li></ul></ul><ul><li>Who has an interest in the outcome? </li></ul><ul><li>What message do we wish to convey? </li></ul><ul><li>What may be the effect of high / low pass rate? </li></ul>
    12. 12. Pertinent Questions Regarding Standard Setting <ul><li>What are the rules of combination in a multi-component examination? </li></ul><ul><li>Who should set the standards? </li></ul><ul><ul><li>Examiners? </li></ul></ul><ul><ul><li>Clinical practitioners? </li></ul></ul><ul><ul><li>Patients? </li></ul></ul><ul><li>Should the standards be absolute or relative? </li></ul><ul><li>What happens to those who fail under the current standards? </li></ul><ul><li>Are there any appeals procedure? </li></ul>
    13. 13. Qualities of Good Standards for Assessments <ul><li>Transparent marking and standard-setting process </li></ul><ul><li>High reliability indices (Cronbach’s α >0.8; Cohen’s κ >+4)* </li></ul><ul><li>Corrections for test variance and Error of Measurement </li></ul><ul><li>Low examiner variability (recruitment, training, feedback) </li></ul><ul><li>Fair appeals procedure </li></ul>
    14. 14. Educational Benefits of Standard Setting <ul><li>Faculty development </li></ul><ul><li>Quality control of test materials </li></ul>
    15. 15. Standards – Classification <ul><li>Norm-referenced standards vs. Criterion-referenced standards </li></ul><ul><li>Compensatory Standards vs. Conjunctive Standards </li></ul>
    16. 16. Orthogonal Bipolar Standards
    17. 17. Norm-referenced Standards <ul><li>Standard is based on performance of an external large representative sample (‘ Norm group ’) equivalent to candidates taking the test </li></ul><ul><li>May result in reasonable standards provided the group is representative of candidates’ population, heterogeneous and large </li></ul>
    18. 18. Criterion-referenced Standards <ul><li>Links the standard to a set criterion of the competence level under consideration </li></ul><ul><li>Can be: </li></ul><ul><ul><li>Relative criterion standard </li></ul></ul><ul><ul><li>Absolute criterion standard </li></ul></ul>
    19. 19. Relative Criterion Standard <ul><li>A relative standard can be set at the mean performance of candidates; </li></ul><ul><li>Or by defining the units of SD from mean </li></ul><ul><li>These standards may vary from year to year due to shifts in ability of the group </li></ul><ul><li>May result in a fixed annual percentage of failing students </li></ul>
    20. 20. Absolute Criterion Standard <ul><li>Absolute criterion standard stays same over multiple administrations of the test, relative to the content specifications of the test </li></ul><ul><li>Failure rate may vary due to changes in the group’s ability, from one test administration to the other </li></ul>
    21. 21. <ul><li>The standard is set on the total test score </li></ul><ul><li>Candidates can compensate for poor performance in some parts of exam with good performance in others </li></ul>Compensatory Standards
    22. 22. <ul><li>Standards are set for individual components of the examination </li></ul><ul><li>Candidates cannot compensate for poor performance in one part </li></ul><ul><ul><li>Each skill component considered separately </li></ul></ul><ul><ul><li>Allows diagnostic feedback to candidates </li></ul></ul><ul><ul><li>Higher the correlation among test components, greater the inclination towards a compensatory standard </li></ul></ul>Conjunctive Standards
    23. 23. Standard-setting Models <ul><li>Test-centered models: Judges review test items and provide judgments as to ‘ just adequate ’ level of performance on these items </li></ul><ul><li>Examinee-centered models: Judges identify (and sort) an actual (not hypothetical) group of examinees </li></ul>
    24. 24. Test-centered Models <ul><li>Angoff model </li></ul><ul><li>Ebel’s approach </li></ul><ul><li>Nedelsky approach </li></ul><ul><li>Jaeger’s method </li></ul>
    25. 25. Angoff Model <ul><li>A judgemental approach </li></ul><ul><li>Group of expert judges make judgements about how borderline candidates would perform on items in the examination </li></ul><ul><li>Details described later… </li></ul>
    26. 26. Ebel’s Approach <ul><li>Judges categorise items in a test according to levels of difficulty and relevance to the decision to be made </li></ul><ul><li>Then they decide on proportion of items in each category that a hypothetical group of examinees could respond to correctly </li></ul>
    27. 27. Nedelsky Approach <ul><li>Originally designed for multiple choice items </li></ul><ul><li>For each item , judges decide on how many of the distractors (response options) a minimally competent examinee would recognise as being incorrect </li></ul>
    28. 28. Jaeger’s method <ul><li>Emphasises the need to sample all populations that have a legitimate interest in outcomes of competency testing </li></ul><ul><li>Focuses on passing examinees rather than on borderline or minimally competent </li></ul>
    29. 29. Examinee-centered Models <ul><li>Borderline-group method </li></ul><ul><li>Contrasts-by-group approach </li></ul><ul><li>Hofstee method </li></ul>
    30. 30. Borderline-group method <ul><li>Judges identify an actual (not hypothetical) borderline group </li></ul><ul><li>The median score for this group is used as the passing score </li></ul>
    31. 31. Contrasts-by-Group approach <ul><li>Panellists sort examinees into 2 groups: competent and not-competent </li></ul><ul><ul><li>This judgement is based on prior characteristics of examinees rather than the current test scores </li></ul></ul><ul><ul><li>Test scores are not known to panellist during sorting process </li></ul></ul><ul><li>After sorting is completed, score distributions for competent / not-competent groups are plotted </li></ul><ul><li>Point of intersection of the two distributions is considered as the passing score </li></ul>
    32. 32. Hofstee Method <ul><li>A standard setting approach that incorporates advantages of both relative and absolute standard setting procedures </li></ul><ul><li>Details described later… </li></ul>
    33. 33. Two Common Standard-Setting Procedures <ul><li>Modified Angoff procedure </li></ul><ul><ul><li>A Test-centered model </li></ul></ul><ul><ul><li>Judgmental approach </li></ul></ul><ul><ul><li>Suitable for MCQ examinations </li></ul></ul><ul><li>The Hofstee method </li></ul><ul><ul><li>An Examinee-centered model </li></ul></ul><ul><ul><li>Compromise relative/absolute method </li></ul></ul><ul><ul><li>Suitable for overall pass/fail decisions </li></ul></ul><ul><ul><li>Approved by USMLE </li></ul></ul>
    34. 34. Modified Angoff Procedure <ul><li>Judges discuss characteristics of a borderline candidate ‘only just good enough to pass’ </li></ul><ul><li>They make judgements about borderline candidate’s likelihood to respond correctly to each test item </li></ul><ul><li>For each test item, judges estimate % of borderline candidates that is likely to answer the item correctly </li></ul><ul><li>Pass / fail standard is the average of % for all items </li></ul>
    35. 35. The Hofstee Method <ul><li>This takes advantages of both relative and absolute standard-setting procedures and arrives at a compromise between the two </li></ul><ul><li>Reference group of judges agree on ff : </li></ul><ul><ul><li>Lowest acceptable fail rate (A) </li></ul></ul><ul><ul><li>Highest acceptable fail rate (B) </li></ul></ul><ul><ul><li>Lowest permissible passing grade (C) </li></ul></ul><ul><ul><li>The required passing score (D) </li></ul></ul>
    36. 36. The Hofstee Method Adapted from Roger Neighbour 2006
    37. 42. Evaluation of Standards <ul><li>Standard setting process should be evaluated </li></ul><ul><li>Evaluation includes data on 1 st and 2 nd ratings of panellists for each test item rated </li></ul><ul><li>This should demonstrate increased consensus among raters (Cohen’s κ inter-rater reliability) </li></ul><ul><li>A questionnaire should be administered to panellists at end of standard setting process </li></ul>
    38. 43. Future Perspectives <ul><li>Much work is still needed to establish effective standard setting procedures </li></ul><ul><li>Length of procedures should be considered </li></ul><ul><li>Ways to shorten the process are needed </li></ul><ul><li>Fully compensatory models should be considered, in which test items are averaged to produce a test standard </li></ul>
    39. 44. Future Perspectives <ul><li>Obtained standards should be checked against other information available on the test-taker to ensure construct validity </li></ul><ul><li>Effective methods of training panellists to recognise borderline characteristics are essential if Angoff approach is widely used </li></ul>
    40. 45. Conclusion <ul><li>The more standard setting procedures are applied to a variety of tests, </li></ul><ul><li>More the practice of high quality testing will be enhanced, and </li></ul><ul><li>Higher will be the confidence in the testing of professional competencies </li></ul>
    41. 46. References <ul><li>Neighbour, Roger. Summative assessment and standard setting. </li></ul><ul><li> </li></ul><ul><li>Friedman Ben-David. Standard setting in student assessment – An extended summary of AMEE Medical Education Guide No 18. Medical Teacher (2000) 22, 2, pp 120-130 </li></ul>