Your SlideShare is downloading. ×
0
De champlain agm_sunday_2012
De champlain agm_sunday_2012
De champlain agm_sunday_2012
De champlain agm_sunday_2012
De champlain agm_sunday_2012
De champlain agm_sunday_2012
De champlain agm_sunday_2012
De champlain agm_sunday_2012
De champlain agm_sunday_2012
De champlain agm_sunday_2012
De champlain agm_sunday_2012
De champlain agm_sunday_2012
De champlain agm_sunday_2012
De champlain agm_sunday_2012
De champlain agm_sunday_2012
De champlain agm_sunday_2012
De champlain agm_sunday_2012
De champlain agm_sunday_2012
De champlain agm_sunday_2012
De champlain agm_sunday_2012
De champlain agm_sunday_2012
De champlain agm_sunday_2012
De champlain agm_sunday_2012
De champlain agm_sunday_2012
De champlain agm_sunday_2012
De champlain agm_sunday_2012
De champlain agm_sunday_2012
De champlain agm_sunday_2012
De champlain agm_sunday_2012
De champlain agm_sunday_2012
De champlain agm_sunday_2012
De champlain agm_sunday_2012
De champlain agm_sunday_2012
De champlain agm_sunday_2012
De champlain agm_sunday_2012
De champlain agm_sunday_2012
De champlain agm_sunday_2012
De champlain agm_sunday_2012
De champlain agm_sunday_2012
De champlain agm_sunday_2012
De champlain agm_sunday_2012
De champlain agm_sunday_2012
De champlain agm_sunday_2012
Upcoming SlideShare
Loading in...5
×

Thanks for flagging this SlideShare!

Oops! An error has occurred.

×
Saving this for later? Get the SlideShare app to save on your phone or tablet. Read anywhere, anytime – even offline.
Text the download link to your phone
Standard text messaging rates apply

De champlain agm_sunday_2012

240

Published on

0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total Views
240
On Slideshare
0
From Embeds
0
Number of Embeds
1
Actions
Shares
0
Downloads
5
Comments
0
Likes
0
Embeds 0
No embeds

Report content
Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
No notes for slide
  • This procedural definition refers to the importance of clearly articulating a framework and its implementation in a way that an appeal on the part of candidates is possible.
  • This procedural definition refers to the importance of clearly articulating a framework and its implementation in a way that an appeal on the part of candidates is possible.
  • This procedural definition refers to the importance of clearly articulating a framework and its implementation in a way that an appeal on the part of candidates is possible.
  • This procedural definition refers to the importance of clearly articulating a framework and its implementation in a way that an appeal on the part of candidates is possible.
  • This procedural definition refers to the importance of clearly articulating a framework and its implementation in a way that an appeal on the part of candidates is possible.
  • This procedural definition refers to the importance of clearly articulating a framework and its implementation in a way that an appeal on the part of candidates is possible.
  • This procedural definition refers to the importance of clearly articulating a framework and its implementation in a way that an appeal on the part of candidates is possible.
  • This procedural definition refers to the importance of clearly articulating a framework and its implementation in a way that an appeal on the part of candidates is possible.
  • This is platonic epistemology
  • This procedural definition refers to the importance of clearly articulating a framework and its implementation in a way that an appeal on the part of candidates is possible.
  • This procedural definition refers to the importance of clearly articulating a framework and its implementation in a way that an appeal on the part of candidates is possible.
  • This procedural definition refers to the importance of clearly articulating a framework and its implementation in a way that an appeal on the part of candidates is possible.
  • This procedural definition refers to the importance of clearly articulating a framework and its implementation in a way that an appeal on the part of candidates is possible.
  • This procedural definition refers to the importance of clearly articulating a framework and its implementation in a way that an appeal on the part of candidates is possible.
  • This procedural definition refers to the importance of clearly articulating a framework and its implementation in a way that an appeal on the part of candidates is possible.
  • This procedural definition refers to the importance of clearly articulating a framework and its implementation in a way that an appeal on the part of candidates is possible.
  • This procedural definition refers to the importance of clearly articulating a framework and its implementation in a way that an appeal on the part of candidates is possible.
  • This procedural definition refers to the importance of clearly articulating a framework and its implementation in a way that an appeal on the part of candidates is possible.
  • This procedural definition refers to the importance of clearly articulating a framework and its implementation in a way that an appeal on the part of candidates is possible.
  • This procedural definition refers to the importance of clearly articulating a framework and its implementation in a way that an appeal on the part of candidates is possible.
  • This procedural definition refers to the importance of clearly articulating a framework and its implementation in a way that an appeal on the part of candidates is possible.
  • This procedural definition refers to the importance of clearly articulating a framework and its implementation in a way that an appeal on the part of candidates is possible.
  • This procedural definition refers to the importance of clearly articulating a framework and its implementation in a way that an appeal on the part of candidates is possible.
  • This procedural definition refers to the importance of clearly articulating a framework and its implementation in a way that an appeal on the part of candidates is possible.
  • This procedural definition refers to the importance of clearly articulating a framework and its implementation in a way that an appeal on the part of candidates is possible.
  • This procedural definition refers to the importance of clearly articulating a framework and its implementation in a way that an appeal on the part of candidates is possible.
  • This procedural definition refers to the importance of clearly articulating a framework and its implementation in a way that an appeal on the part of candidates is possible.
  • This procedural definition refers to the importance of clearly articulating a framework and its implementation in a way that an appeal on the part of candidates is possible.
  • This procedural definition refers to the importance of clearly articulating a framework and its implementation in a way that an appeal on the part of candidates is possible.
  • This procedural definition refers to the importance of clearly articulating a framework and its implementation in a way that an appeal on the part of candidates is possible.
  • This procedural definition refers to the importance of clearly articulating a framework and its implementation in a way that an appeal on the part of candidates is possible.
  • This procedural definition refers to the importance of clearly articulating a framework and its implementation in a way that an appeal on the part of candidates is possible.
  • This procedural definition refers to the importance of clearly articulating a framework and its implementation in a way that an appeal on the part of candidates is possible.
  • This procedural definition refers to the importance of clearly articulating a framework and its implementation in a way that an appeal on the part of candidates is possible.
  • This procedural definition refers to the importance of clearly articulating a framework and its implementation in a way that an appeal on the part of candidates is possible.
  • This procedural definition refers to the importance of clearly articulating a framework and its implementation in a way that an appeal on the part of candidates is possible.
  • This procedural definition refers to the importance of clearly articulating a framework and its implementation in a way that an appeal on the part of candidates is possible.
  • Transcript

    • 1. The Top 10 Myths on Standard Setting André De Champlain, PhD Consulting Chief Research Psychometrician &Interim Director of R&D, MCC
    • 2. The Need to Make Decisions• The need to make classifications permeates many aspects of daily life • Classifications required by law • E.g.: Passing an examination to obtain a driver’s license requires meeting a certain level of proficiency with regard to knowledge of traffic laws and performance (passing, parallel parking, etc.) • Keeps unsafe motorists from behind the wheel!
    • 3. The Need to Make Decisions• The need to make classifications permeates many aspects of daily life • Classifications required by law • E.g.: Jury rendering an (impartial) verdict in a criminal trial classifies a defendant as “guilty” or “not guilty” after weighing the evidence of a case, i.e., analyzing the facts • Sentence meted out for incapacitation (“protection of the public”), deterrence, denunciation, rehabilitation, etc.
    • 4. The Need to Make Decisions• The need to make classifications permeates many aspects of professional life • Classifications required within a profession • Medical licensing/registration examination programs • LMCC®, USMLE® , PLAB®, AMC ®, etc. • Medical specialty board examination programs • ABMS® • Medical membership organizations • RCP(UK), RCPSC, CFPC, RACP, RACGP, etc.
    • 5. The Need to Make Decisions Jury Standard Setting PanelComposition Impartial panel that Impartial panel that represents the represents a profession populationSize Randomly selected Randomly selected panel group of citizens (to that is sufficiently large satisfy social decision and representative to rules) define standardTask Renders a verdict “Renders” a decisionPurpose Incapacitation, Protection of the public, rehabilitation remediation
    • 6. The Need to Make Decisions • Number (“cut-score”) can be used to differentiate between several “states or degrees of performance” • Pass / Fail • Grant / Withhold a Credential • Award / Deny a license • Grant / Deny membership • Basic / Proficient / Advanced (Honors)
    • 7. The Need to Make Decisions• To make “sensible” decisions, information is needed • Decision makers need relevant and accurate information • Standard setting is the process by which these informed decisions are arrived at• Standard setting can be defined as the proper following of a prescribed, rational system of rules or procedures resulting in the assignment of a number to differentiate between two or more states or degrees of performance (Cizek, 1993) • Addresses “procedural due process” (legal framework)
    • 8. The Need to Make Decisions• “Procedural due process” • Was the standard setting exercise well documented? • Description of the standard setting exercise • Selection of judges • Overview of training • Definition of the borderline candidate • Judges’ assessment of each phase of the exercise and overall cut-score • What did the judges think of the exercise?
    • 9. The Need to Make Decisions• Consequential impact of standard setting? • What are the outcomes of the process? • Substantive aspect of standard setting • Did the process lead to a “fair” decision? • Consequential aspect of test score use (Messick, 1989) • What are the intended and unintended consequences of implementing a standard? • Several sources of (empirical) evidence can be presented to support the fairness and appropriateness of the decision
    • 10. The Need to Make Decisions• How popular is standard setting? • MEDLINE/PUBMED database • Nearly 600 articles published in this topic area • ERIC (Educational Resources Information Center) • Over 750 articles published in this domain• Despite the immense popularity of standard setting, basic misconceptions still persist• What are common “myths” surrounding standard setting?
    • 11. Myth #1: Standard = Cut-score• Performance standard (Kane,2001) • Qualitative description of an acceptable level of performance and knowledge required in practice • “Conceptual” definition of competence • Performance standard is a construct• Example – MCCQE Part I • The candidate who passes the Medical Council of Canada’s Qualifying Examination Part I (MCCQE Part I) has demonstrated knowledge, clinical skills, and attitudes necessary for entry into supervised clinical practice, as outlined by the Medical Council of Canada’s Objectives
    • 12. Myth #1: Standard = Cut-score• Passing score or cut-score (Kane, 2001) • Selected point on the score scale that corresponds to this performance standard • “Operational” definition of competence • Cut-score is a number• Example – MCCQE Part I • A candidate who scores at or above 390 has met the performance standard defined for the Medical Council of Canada’s Qualifying Examination Part I (MCCQE Part I)
    • 13. Myth #2: There is a “Gold Standard”• Standard setting entails eliciting judgments on what cut-score best represents “competence” • All cut-scores are intrinsically subjective in nature • Cut-scores can, and will vary as a function of several factors including, but not limited to, the method selected to set the standard and the panel of participating judges
    • 14. Myth #2: There is a “Gold Standard”• Cut-scores do not exist externally • The aim of standard setting is not to “discover” some true or preexisting cut-score that separates candidates into mutually exclusive categories (e.g.: competent vs. incompetent)• Standard setting is a process that synthesizes human judgment in a rational and defensible way to facilitate the partitioning of a score scale into 2 or more categories
    • 15. Myth #2: There is a “Gold Standard”• Cut-scores do not exist externally • Standards do not externally exist, i.e. outside of the realm of human opinion • “a right answer [in standard setting] does not exist, except, perhaps, in the minds of those providing judgment” (Jaeger, 1989) • Empirical evidence can help standard setting panels translate (policy-based) judgment onto a score scale in a defensible manner
    • 16. Myth #3: Standard Setting is aPsychometric Exercise• Standard setting lies at the “intersection” of science and art • While we can facilitate the standard setting process using psychometric models, a cut- score is ultimately based on human judgment • Our statistical models can help us to systematize the process, i.e., to translate a policy decision into a cut-score using defensible, well-defined procedures; however, they cannot be used to estimate some “true” cut-score that separates masters from non- masters
    • 17. Myth #3: Standard Setting is aPsychometric Exercise• Standards for Educational and Psychological Testing (1999; p.54) • “Cut-scores embody value judgments as well as technical and empirical considerations” • Given that human judgment and opinion play such a significant role in this process, a cut- score can be regarded as a composite that incorporates considerations that originate from a number of arenas including medical, statistical, educational, social, political and economic
    • 18. Myth #4: The Cut-score is Set by aStandard Setting Panel• A standard setting panel does not “set a cut- score” but rather recommends a cut-score value or standard• The actual cut-score is set by the governing body that legitimizes the process and the use of the cut-score to make pass/fail decisions • e.g.: Legislative body, academy, certification specialty board, a college, etc.
    • 19. Myth #4: The Cut-score is Set by a Standard Setting Panel• The role of the standard setting panel is to provide guidance & information to those bodies that actually are responsible for implementing a given cut-score value or standard• The goal of periodic standard setting exercises is to revisit the appropriateness of a cut-score (not necessarily change it) based on replicated exercises and informed expert judgment
    • 20. Myth #5: Some Standard Setting Methods Are Better Than Others• Standards for Educational and Psychological Testing (1999; p.53) • “There can be no single method for determining cut-scores for all tests or for all purposes, nor can there be any single set of procedures for establishing their defensibility”.• Angoff (1988; p.219) • [Regarding] the problem of setting cut-scores, we have observed that the several judgmental methods not only fail to yield results that agree with one another, they even fail to yield the same results on repeated application” .
    • 21. Myth #5: Some Standard SettingMethods Are Better Than Others• No standard setting method yields an “optimal” cut- score (standards don’t exist outside of the minds of judges)• Extent to which a standard setting process is properly followed has the most impact on the cut-score • Was the purpose of the exam and the standard setting exercise clearly defined? • Were the judges qualified to perform the task? • Was adequate training offered to panelists? Etc.
    • 22. Myth #5: Some Standard SettingMethods Are Better Than Others• Factors to consider when selecting a standard setting method • A. What is the purpose of examination? • With professional exams, norm-referenced approaches are appropriate in instances where a limited number of candidates can meet the cut-score • Placement, promotion, awards, etc. • In most instances, criterion-referenced approaches are more suitable • Medical licensure/certification decisions, passing a clerkship/ internship, etc.
    • 23. Myth #5: Some Standard SettingMethods Are Better Than Others• Factors to consider when selecting a standard setting method • B. How complex is the examination? • For knowledge-based exams (e.g.: dichotomously- scored MCQs), test-centered methods (Angoff, Ebel, Bookmark, etc.) are appropriate given the task required to complete • For performance assessments (OSCEs, workplace- based assessments, etc.), examinee-centered approaches (borderline groups, contrasting-groups, body of work methods) are better suited given the complex, multidimensional nature of the performance
    • 24. Myth #5: Some Standard SettingMethods Are Better Than Others• Factors to consider when selecting a standard setting method • C. What is the test format? • Certain standard setting methods were developed solely for use with MCQs (e.g.: Nedelsky). • While other methods can be used with different formats (e.g. Angoff methods), certain assumptions are made that may or may not meet expectations (Angoff assumes a compensatory model) • Other methods (Hofstee, contrasting-groups) were developed as test format invariant
    • 25. Myth #5: Some Standard SettingMethods Are Better Than Others• Factors to consider when selecting a standard setting method • D. What resources are available? • In very high-stakes settings (e.g. medical licensing exam), a complex standard setting exercise which includes several panels of judges, extensive training, multiple rounds of judgments, etc., might be preferable • In lower-stakes settings (elective clerkship examination), less intensive models might be appropriate • What makes the most sense given the intended use of the information?
    • 26. Myth #5: Some Standard SettingMethods Are Better Than Others• Why not combine several standard setting procedures? • Standard setting and the selection of a cut-score are a policy decision • There’s little empirical evidence to suggest that combining multiple methods will lead to a “better” standard • There is no “correct” cut-score, so how can policy makers synthesize results from multiple approaches? • Also requires significantly more resources
    • 27. Myth #5: Some Standard SettingMethods Are Better Than Others• Always better to systematically implement 1 standard setting method rather than provide results from several (poorly) implemented approaches • Properly document all phases of standard setting • Objective, selection of participants, training, etc. • Provide empirical evidence to support use of cut-score • Impact of sources of variability (judges, panels, etc.) • Consequences of implementing a cut-score • Surveys, etc.
    • 28. Myth #6: Expert Clinicians are defacto Expert Standard Setting Judges• Selection and training of judges most critical to the success of any standard setting exercise • However, being a content expert is not synonymous with expert standard setting judge• Participating standard setting judges need to be carefully trained to ensure that they understand the task and to minimize biases
    • 29. Myth #6: Expert Clinicians are de facto Standard Setting Judges• Standards for Educational and Psychological Testing (1999; p.54) • Care must be taken to assure that judges understand what they are to do. The process must be such that well-qualified judges can apply their knowledge and experience to reach meaningful and relevant judgments that accurately reflect their understandings and intentions”.
    • 30. Myth #6: Expert Clinicians are defacto Standard Setting Judges• Training usually includes the following steps: 1. Provision of sample materials (test specifications, blueprint, sample items/stations, etc.) 2. Clear presentation of the purpose of standard setting and what we are asking of participants 3. Discussion and definition of what constitutes a borderline candidate 4. Judgments on a set of exemplars 5. Discussion and clarification of any misconceptions amongst participants 6. Survey participants on all aspects of training
    • 31. Myth #7: We “Know” Who the Truly Competent Candidates Are• Classification errors are always present in standard setting • High-quality exams and well implemented standard setting exercises can significantly minimize the proportion of misclassifications • False positive misclassification • Candidate who “truly” lacks the knowledge, skill and/or ability necessary to pass the examination, but actually passes • False negative misclassification • Candidate who “truly” possesses the knowledge, skill and/or ability necessary to pass the examination, but actually fails
    • 32. Myth #7: We “Know” Who theTruly Competent Candidates Are• Why do classification errors occur? • Cut-scores represent inferences about the “real” or “true” level of knowledge, skill possessed by candidates • The quality of those inferences is related to a number of factors: • The number of items/cases sampled for the standard setting exercise • The number of judges selected and their degree of representativeness, etc. • Consequently, pass/fail classifications of candidates will always be somewhat imperfect
    • 33. Myth #7: We “Know” Who the Truly Competent Candidates Are• We can’t actually identify false positive and negative misclassifications • If we knew a candidate was a false negative, we’d do something about it!• We can estimate misclassification errors using a host of statistical indices (Brennan, 2004)• In medicine, protection of the public is a prime concern of examinations • Minimizing false positive misclassifications is generally of greater interest
    • 34. Myth #8: All Decisions Are CreatedEqually• For fairness reasons, failing candidates are generally allowed to retake an examination (sometimes repeatedly)• Millman (1989) showed that the greater the number of (repeat) attempts to pass an exam, the greater the likelihood that a candidate who does not possess the level of knowledge or skill needed to pass, will indeed pass (false positive)
    • 35. Myth #8: All Decisions Are CreatedEqually• This phenomenon can be attributed to a number of reasons including: • Possible re-exposure of material (security issue) • (Compounded) measurement errors associated with each test score • The more times a candidate repeats, the more likely their score will be sufficiently high (overestimated) to result in a false positive decision • This could significantly impact safe and effective patient care given the link between medical licensing exam scores and future egregious acts in practice (Tamblyn et al. research)
    • 36. Myth #8: All Decisions Are Created Equally• How serious of a problem is the issue of repeat attempts on false positive rates?• Millman example (1989) • Let’s assume that a cut-score is 70% on an exam • A candidate with a true ability of 65% (should fail) has a greater than 50/50 chance of passing the exam due to measurement error after 5 attempts (with MCQ exam, i.e., high reliability)
    • 37. Myth #8: All Decisions Are CreatedEqually• How might we control for this effect? • Increase the size of the item/station bank to reduce the likelihood that previously seen material will appear on repeat test attempts • Incorporate item/station exposure as a constraint when assembling test forms • Adjust the cut-score to minimize misclassifications • A panel sets the standard at 65% • We can adjust the cut-score so that a candidate with a true ability level of 65% (true master) has a near zero probability of being misclassified
    • 38. Myth #9: A Cut-score/Standard Does Not Need to Be Evaluated• A cut-score reflects the (informed) judgments of a small sample of experts, based on sample of items/stations, at a specific point in time, using one or only a few methods • Cut-scores can and will vary as a function of these factors that need to be evaluated • Evidence to support both the “internal” and “external” validity of your cut-score should be collected and presented to support its intended use
    • 39. Myth #9: A Cut-score/StandardDoes Not Need to Be Evaluated• Evaluating your standard • Internal validation • How reproducible is the cut-score across facets? • Judges (inter-rater consistency)? • Sample of stations? • Panels of judges? Etc. • Generalizability analysis and rater models (IRT) are useful to help us assess how variable the cut-score is across these facets
    • 40. Myth #9: A Cut-score/StandardDoes Not Need to Be Evaluated• Evaluating your standard • External validation • How do the decisions relate to other measures? • If scores on two exams are highly related, but decision consistency is low, perhaps the cut- score on one assessment is not appropriate? • Impact • How comparable are P/F rates to historical trends? • Does the cut-score lead to “acceptable” results?
    • 41. Myth #10: The Angoff Method WasDeveloped by Angoff • Angoff did not formally develop the (Angoff) standard setting method • Origin can be traced back to a footnote in a chapter on scales, norms and equivalent scores that Angoff wrote in 1971 • Angoff ascribed the procedure to Tucker • Method was a “systematic procedure for deciding on the minimum raw scores for passing and honors”
    • 42. Myth #10: The Angoff Method WasDeveloped by Angoff“a slight variation of this procedure is toask each judge to state the probability thatthe “minimally acceptable person” wouldanswer each item correctly. In effect, thejudges would think of a number ofminimally acceptable persons, instead ofonly one such person, and would estimatethe proportion of minimally acceptablepersons who would answer each itemcorrectly. The sum of the probabilities, orproportions, would then represent theminimally acceptable score (p. 515)”.

    ×