Standard Setting and Medical Students’ Assessment Dr. Sanjoy Sanyal Associate Professor – Neurosciences Medical University of the Americas  Nevis, St. Kitts-Nevis, WI [email_address]
List of topics Summative assessment Standard-setting Classification of standards Standard-setting models Test-centered models Examinee-centered models Modified Angoff approach The Hofstee method Evaluation of standards Future perspectives Conclusion References
What is Summative Assessment? In the context of a Caribbean medical school training students to be future doctors, summative assessment can be interpreted as any of the following: End-point / end-semester assessment Certification examination Licensing examination
Reasons for Post-training Summative Assessment Trainee motivation: Assessment drives learning Recognition of achievement Rite of passage: Initiation to the profession Reputation of the discipline Patient safety Quality marker for patients
Characteristics of Good Summative Assessment Proportionate Fair Predictive validity Cost-effective Consistent Construct validity Practicable Accurate Content validity FEASIBILITY RELIABILITY** VALIDITY*
Assessment Methods Adapted from Roger Neighbour 2006
Factors Playing a Role in Recruitment of Examiners Qualities Credibility Can ‘rank order’ Trainable Impartial Team players Incentives Status Influence Stimulation ‘ Make a difference’ Financial
Selection of Standard-setting Panelists Experts in related field of examination Familiar with examination methods Good problem solvers Familiar with level of candidates Interested in education (teachers)
Establishing Standards – Policy Decision Deciding who should pass or fail should be a matter of policy decision rather than a statistical exercise
Why Need Standard Setting? To provide an  educational tool  to decide  cut-off point  on the scoring scale which separates the non-competent from the competent To determine  standards of performance , which separate competent from the non-competent candidate
Pertinent Questions Regarding Standard Setting What is the main purpose of assessment? What is at stake? For students For patients For organization Who has an interest in the outcome? What message do we wish to convey? What may be the effect of high / low pass rate?
Pertinent Questions Regarding Standard Setting What are the rules of combination in a multi-component examination? Who should set the standards? Examiners? Clinical practitioners? Patients? Should the standards be absolute or relative? What happens to those who fail under the current standards? Are there any appeals procedure?
Qualities of Good Standards for Assessments Transparent marking and standard-setting process High reliability indices (Cronbach’s  α  >0.8; Cohen’s  κ  >+4)* Corrections for test variance and Error of Measurement Low examiner variability (recruitment, training, feedback) Fair appeals procedure
Educational Benefits of Standard Setting Faculty development Quality control of test materials
Standards – Classification Norm-referenced standards vs. Criterion-referenced standards Compensatory Standards vs. Conjunctive Standards
Orthogonal Bipolar Standards
Norm-referenced Standards Standard is based on performance of an external large representative sample (‘ Norm group ’) equivalent to candidates taking the test May result in reasonable standards provided the group is  representative  of candidates’ population,  heterogeneous  and  large
Criterion-referenced Standards Links the standard to a set  criterion  of the competence level under consideration Can be: Relative  criterion standard Absolute  criterion standard
Relative  Criterion Standard A  relative  standard can be set at the  mean performance  of candidates;  Or by defining the units of  SD from mean These standards may  vary  from year to year due to shifts in ability of the group May result in a  fixed  annual percentage of failing students
Absolute  Criterion Standard Absolute  criterion standard stays  same  over multiple administrations of the test, relative to the content specifications of the test Failure rate may  vary  due to changes in the group’s ability, from one test administration to the other
The standard is set on the  total test score Candidates can compensate for poor performance in some parts of exam with good performance in others  Compensatory Standards
Standards are set for  individual components  of the examination  Candidates cannot compensate for poor performance in one part Each skill component considered separately Allows diagnostic feedback to candidates Higher the correlation among test components, greater the inclination towards a  compensatory  standard Conjunctive Standards
Standard-setting Models Test-centered models:   Judges review test items and provide judgments as to ‘ just adequate ’ level of performance on these items Examinee-centered models:   Judges identify (and sort) an actual (not hypothetical) group of examinees
Test-centered Models Angoff model Ebel’s approach Nedelsky approach Jaeger’s method
Angoff Model A judgemental approach Group of expert judges make judgements about how borderline candidates would perform on  items  in the examination Details described later…
Ebel’s Approach Judges categorise  items  in a test according to levels of difficulty and relevance to the decision to be made Then they decide on proportion of items in each category that a hypothetical group of examinees could respond to correctly
Nedelsky Approach Originally designed for  multiple choice  items  For each  item , judges decide on how many of the distractors (response options) a minimally competent examinee would recognise as being incorrect
Jaeger’s method Emphasises the need to sample all populations that have a legitimate interest in outcomes of competency testing Focuses on passing examinees rather than on borderline or minimally competent
Examinee-centered Models Borderline-group method Contrasts-by-group approach Hofstee method
Borderline-group method Judges identify an actual (not hypothetical) borderline group The median score for this group is used as the passing score
Contrasts-by-Group approach Panellists sort examinees into 2 groups:  competent  and  not-competent This judgement is based on prior characteristics of examinees rather than the current test scores Test scores are not known to panellist during sorting process After sorting is completed, score distributions for competent / not-competent groups are plotted Point of intersection of the two distributions is considered as the passing score
Hofstee Method A standard setting approach that incorporates advantages of both  relative  and  absolute  standard setting procedures Details described later…
Two Common Standard-Setting Procedures Modified Angoff procedure A Test-centered model Judgmental approach Suitable for MCQ examinations The Hofstee method An Examinee-centered model Compromise relative/absolute method Suitable for overall pass/fail decisions Approved by USMLE
Modified Angoff Procedure Judges discuss characteristics of a borderline candidate ‘only just good enough to pass’ They make judgements about borderline candidate’s likelihood to respond correctly to each test item For each test item, judges estimate % of borderline candidates that is likely to answer the item correctly Pass / fail standard is the average of % for all items
The Hofstee Method This takes advantages of both relative and absolute standard-setting procedures and arrives at a compromise between the two Reference group of judges agree on ff : Lowest acceptable fail rate (A) Highest acceptable fail rate (B) Lowest permissible passing grade (C) The required passing score (D)
The Hofstee Method Adapted from Roger Neighbour 2006
 
 
 
 
 
Evaluation of Standards Standard setting process should be evaluated Evaluation includes data on 1 st  and 2 nd  ratings of panellists for each test item rated This should demonstrate increased consensus among raters (Cohen’s  κ  inter-rater reliability) A questionnaire should be administered to panellists at end of standard setting process
Future Perspectives Much work is still needed to establish effective standard setting procedures  Length of procedures should be considered  Ways to shorten the process are needed  Fully  compensatory models  should be considered, in which test items are averaged to produce a test standard
Future Perspectives Obtained standards should be checked against other information available on the test-taker to ensure construct validity Effective methods of training panellists to recognise borderline characteristics are essential if Angoff approach is widely used
Conclusion The more standard setting procedures are applied to a variety of tests,  More the practice of high quality testing will be enhanced, and  Higher will be the confidence in the testing of professional competencies
References Neighbour, Roger. Summative assessment and standard setting.  www.jafm.org/edu/20060128/sem4_060129.pdf Friedman Ben-David. Standard setting in student assessment – An extended summary of AMEE Medical Education Guide No 18. Medical Teacher (2000) 22, 2, pp 120-130   www.medev.ac.uk/resources/features/AMEE_summaries/Guide18summaryMar04.pdf

Standard Setting In Medical Exams

  • 1.
    Standard Setting andMedical Students’ Assessment Dr. Sanjoy Sanyal Associate Professor – Neurosciences Medical University of the Americas Nevis, St. Kitts-Nevis, WI [email_address]
  • 2.
    List of topicsSummative assessment Standard-setting Classification of standards Standard-setting models Test-centered models Examinee-centered models Modified Angoff approach The Hofstee method Evaluation of standards Future perspectives Conclusion References
  • 3.
    What is SummativeAssessment? In the context of a Caribbean medical school training students to be future doctors, summative assessment can be interpreted as any of the following: End-point / end-semester assessment Certification examination Licensing examination
  • 4.
    Reasons for Post-trainingSummative Assessment Trainee motivation: Assessment drives learning Recognition of achievement Rite of passage: Initiation to the profession Reputation of the discipline Patient safety Quality marker for patients
  • 5.
    Characteristics of GoodSummative Assessment Proportionate Fair Predictive validity Cost-effective Consistent Construct validity Practicable Accurate Content validity FEASIBILITY RELIABILITY** VALIDITY*
  • 6.
    Assessment Methods Adaptedfrom Roger Neighbour 2006
  • 7.
    Factors Playing aRole in Recruitment of Examiners Qualities Credibility Can ‘rank order’ Trainable Impartial Team players Incentives Status Influence Stimulation ‘ Make a difference’ Financial
  • 8.
    Selection of Standard-settingPanelists Experts in related field of examination Familiar with examination methods Good problem solvers Familiar with level of candidates Interested in education (teachers)
  • 9.
    Establishing Standards –Policy Decision Deciding who should pass or fail should be a matter of policy decision rather than a statistical exercise
  • 10.
    Why Need StandardSetting? To provide an educational tool to decide cut-off point on the scoring scale which separates the non-competent from the competent To determine standards of performance , which separate competent from the non-competent candidate
  • 11.
    Pertinent Questions RegardingStandard Setting What is the main purpose of assessment? What is at stake? For students For patients For organization Who has an interest in the outcome? What message do we wish to convey? What may be the effect of high / low pass rate?
  • 12.
    Pertinent Questions RegardingStandard Setting What are the rules of combination in a multi-component examination? Who should set the standards? Examiners? Clinical practitioners? Patients? Should the standards be absolute or relative? What happens to those who fail under the current standards? Are there any appeals procedure?
  • 13.
    Qualities of GoodStandards for Assessments Transparent marking and standard-setting process High reliability indices (Cronbach’s α >0.8; Cohen’s κ >+4)* Corrections for test variance and Error of Measurement Low examiner variability (recruitment, training, feedback) Fair appeals procedure
  • 14.
    Educational Benefits ofStandard Setting Faculty development Quality control of test materials
  • 15.
    Standards – ClassificationNorm-referenced standards vs. Criterion-referenced standards Compensatory Standards vs. Conjunctive Standards
  • 16.
  • 17.
    Norm-referenced Standards Standardis based on performance of an external large representative sample (‘ Norm group ’) equivalent to candidates taking the test May result in reasonable standards provided the group is representative of candidates’ population, heterogeneous and large
  • 18.
    Criterion-referenced Standards Linksthe standard to a set criterion of the competence level under consideration Can be: Relative criterion standard Absolute criterion standard
  • 19.
    Relative CriterionStandard A relative standard can be set at the mean performance of candidates; Or by defining the units of SD from mean These standards may vary from year to year due to shifts in ability of the group May result in a fixed annual percentage of failing students
  • 20.
    Absolute CriterionStandard Absolute criterion standard stays same over multiple administrations of the test, relative to the content specifications of the test Failure rate may vary due to changes in the group’s ability, from one test administration to the other
  • 21.
    The standard isset on the total test score Candidates can compensate for poor performance in some parts of exam with good performance in others Compensatory Standards
  • 22.
    Standards are setfor individual components of the examination Candidates cannot compensate for poor performance in one part Each skill component considered separately Allows diagnostic feedback to candidates Higher the correlation among test components, greater the inclination towards a compensatory standard Conjunctive Standards
  • 23.
    Standard-setting Models Test-centeredmodels: Judges review test items and provide judgments as to ‘ just adequate ’ level of performance on these items Examinee-centered models: Judges identify (and sort) an actual (not hypothetical) group of examinees
  • 24.
    Test-centered Models Angoffmodel Ebel’s approach Nedelsky approach Jaeger’s method
  • 25.
    Angoff Model Ajudgemental approach Group of expert judges make judgements about how borderline candidates would perform on items in the examination Details described later…
  • 26.
    Ebel’s Approach Judgescategorise items in a test according to levels of difficulty and relevance to the decision to be made Then they decide on proportion of items in each category that a hypothetical group of examinees could respond to correctly
  • 27.
    Nedelsky Approach Originallydesigned for multiple choice items For each item , judges decide on how many of the distractors (response options) a minimally competent examinee would recognise as being incorrect
  • 28.
    Jaeger’s method Emphasisesthe need to sample all populations that have a legitimate interest in outcomes of competency testing Focuses on passing examinees rather than on borderline or minimally competent
  • 29.
    Examinee-centered Models Borderline-groupmethod Contrasts-by-group approach Hofstee method
  • 30.
    Borderline-group method Judgesidentify an actual (not hypothetical) borderline group The median score for this group is used as the passing score
  • 31.
    Contrasts-by-Group approach Panellistssort examinees into 2 groups: competent and not-competent This judgement is based on prior characteristics of examinees rather than the current test scores Test scores are not known to panellist during sorting process After sorting is completed, score distributions for competent / not-competent groups are plotted Point of intersection of the two distributions is considered as the passing score
  • 32.
    Hofstee Method Astandard setting approach that incorporates advantages of both relative and absolute standard setting procedures Details described later…
  • 33.
    Two Common Standard-SettingProcedures Modified Angoff procedure A Test-centered model Judgmental approach Suitable for MCQ examinations The Hofstee method An Examinee-centered model Compromise relative/absolute method Suitable for overall pass/fail decisions Approved by USMLE
  • 34.
    Modified Angoff ProcedureJudges discuss characteristics of a borderline candidate ‘only just good enough to pass’ They make judgements about borderline candidate’s likelihood to respond correctly to each test item For each test item, judges estimate % of borderline candidates that is likely to answer the item correctly Pass / fail standard is the average of % for all items
  • 35.
    The Hofstee MethodThis takes advantages of both relative and absolute standard-setting procedures and arrives at a compromise between the two Reference group of judges agree on ff : Lowest acceptable fail rate (A) Highest acceptable fail rate (B) Lowest permissible passing grade (C) The required passing score (D)
  • 36.
    The Hofstee MethodAdapted from Roger Neighbour 2006
  • 37.
  • 38.
  • 39.
  • 40.
  • 41.
  • 42.
    Evaluation of StandardsStandard setting process should be evaluated Evaluation includes data on 1 st and 2 nd ratings of panellists for each test item rated This should demonstrate increased consensus among raters (Cohen’s κ inter-rater reliability) A questionnaire should be administered to panellists at end of standard setting process
  • 43.
    Future Perspectives Muchwork is still needed to establish effective standard setting procedures Length of procedures should be considered Ways to shorten the process are needed Fully compensatory models should be considered, in which test items are averaged to produce a test standard
  • 44.
    Future Perspectives Obtainedstandards should be checked against other information available on the test-taker to ensure construct validity Effective methods of training panellists to recognise borderline characteristics are essential if Angoff approach is widely used
  • 45.
    Conclusion The morestandard setting procedures are applied to a variety of tests, More the practice of high quality testing will be enhanced, and Higher will be the confidence in the testing of professional competencies
  • 46.
    References Neighbour, Roger.Summative assessment and standard setting. www.jafm.org/edu/20060128/sem4_060129.pdf Friedman Ben-David. Standard setting in student assessment – An extended summary of AMEE Medical Education Guide No 18. Medical Teacher (2000) 22, 2, pp 120-130 www.medev.ac.uk/resources/features/AMEE_summaries/Guide18summaryMar04.pdf

Editor's Notes

  • #6 *Construct validity = Convergent and Discriminant; Criterion validity = Concurrent and Predictive **Reliability: Inter-rater reliability, Internal consistency reliability; Test-retest reliability
  • #7 OSCE : Objective Structured Clinical Exam; Common OSCE exam formats including History Taking, Physical Examination, Consult, and Emergency Room Management Stations
  • #14 *Cronbach’s alpha = Internal consistency reliability; Cohen’s kappa = Inter-rater reliability
  • #20 SD = Standard Deviation