Assessment Issues
Hirotaka Onishi MD, MHPE
Int’l Research Center for Medical
Education, University of Tokyo
Aims of “Assessment”
 Comparing with standard
 Providing feedback
 info for decision making
 Motivating learners
 Accountability for learners,
educators and society
Overview
1. Curriculum and assessment
2. Various assessment tools
3. Issues to consider
1. Curriculum and Assessment
Assessment in Curriculum
Development Process
 6-step approach(Kern)
2. Targeted needs
assessment
3. Goals and objectives
4. Educational strategy
5. Implementation
6. Assessment
1. Problem identification and
general needs assessment
●
●
●
●
●
●
3. Goals and objectives
6. Assessment
Relationship between
Objectives and Assessment
 Objectives writing
 Specific & Measurable – Easy to assess
 Categorize by taxonomy – Selecting
educational and assessment methods
Example:
X : Students will be able to perform systemic
neurological exam.
O : By the end of neurology clerkship, 4th year
student will be demonstrate relevant performance
on predetermined 31-items neurological exam.
Balanced Performance
Overall Performance
Problem
Solving Attitude
Recall Level
Knowledge
Skill
Difficult Areas to Assess
 Knowledge is easy to assess.
 Skill is somewhat difficult to assess.
 Problem solving and attitude are
difficult to assess: “Iceberg below the
water”
Ability easy to assess:
Knowledge, Skill
Ability difficult to
assess
Taxonomy &
Assessment Method
Taxonomy Recommended Assessment method
Knowledge Multiple choice questions (MCQs), Short
answer questions (SAQs)
Problem-
solving
Essays, Oral exams, Script concordance
test, Key feature problem
Attitude Rating scale, Multi-source feedback (360
degree)
Skill Skill assessment, OSCE
Behavior Real setting observation, Chart audit
Why Is Taxonomy Useful?
 By classifying taxonomy, educators
can understand:
 Assessment for deep knowledge and
attitude is more difficult than skills and
assessment for skills is more difficult
than knowledge assessment.
 Assessment for surface knowledge is
insufficient to solve problems
Formative vs Summative
 Formative assessment
 To identify where and how to improve
during curriculum
 Summative assessment
 To make decision (e.g. pass/fail) at the
end of curriculum
Start of curriculum middle end
Formative Summative
Assessment Assessment
Exam Result
95.0
90.0
85.0
80.0
75.0
70.0
65.0
60.0
55.0
50.0
Frequency
60
50
40
30
20
10
0High reliability is
needed to compare
student scores
Is Reliability
Important?
 Suppose two marks
 Student A: 85%
 Student B: 80%
 Can we conclude A
is more competent
than B?
High-stake test
 Its assessment result is used for
critical assessment decision making
 Graduation, License, Certificate…
 Highest reliability is demanded
 Validity is sometimes neglected…
Ex) Japanese Medical License Exam consists of
MCQs but does not include skill assessment.
Reliability & Validity
 Reliability: Stability of measurement
 Validity: Degree to measure what should be
measured (content, construct, criterion)
Reliable but not
valid
Not reliable but
directed to the target
on average
To Assess Clinical Work
Reliable but
not valid
MCQs for Graduation or
National Licensure Exam
Peer/Other Health
Staff/Pt’s assessment
Not reliable but
directed to the target
Attitude Knowledge
Skill
Attitude
Knowledge
Skill
2. Various Assessment Tools
Validity: Level of Clinical
Competence and Assessment
Knows
Knows How
Shows
How
Does
(Miller GE. Acad Med 1990; 65: S63-7.)
Performance-Based ass.
Mini-CEX, DOP, MSF…
Clinical tests: MEQs,
Key Feature Problems…
Competence-Based ass.
OSCE, SP-based test…
Factual tests: MCQs…
Professionalauthenticity
Reliability,Cost
Comparison for Knowledge
Assessment Methods
 MCQ
 Higher reliability and objectivity
 Difficult to develop good questions
 Induce surface learning and test-wiseness
 SAQ (short answer questions)
 Deeper knowledge is needed
 Examiners must mark papers
 Subjectivity of marking
Comparison for Deep Knowledge
Assessment Methods
 Essays
 Systematic for questions and marking
 May repeat reviewing process of marking
 Oral exams
 Examiners can adjust how they ask questions for
each examinee
 Difficult to monitor/review the processes
 KFPs (Key feature problems)
 Difficult to produce
 High reliability
Extended-Matching Items
(R-type)
The following set includes two item stems. The first requires that the
examinee synthesise information in order to determine a diagnosis;
the second requires only recall of an isolated fact.
A. Vit A B. Vit B1 C. Vit B2 D. Vit B6
E. Vit C F. Vit D G. Vit E H. Vit K
I. Biotin J. Copper K. Folate L. Iodine
M. Iron N. Magnesium O. Niacin P. Zinc
For each pt. with clinical features caused by metabolic abnormalities,
select the vitamin or mineral that is most likely to be involved.
1. A 70yo widower has ecchymoses, perifollicular paetechiae, and
swelling of the gingiva. His diet consists mostly cola and hot dogs.
2. Involved in clotting factor synthesis.
(Q1: E, Q2: H)
Attitude Assessment
Method
 Multi-source feedback
 Scales are impression-based
 If contact with assessor is low,
assessment becomes difficult
 Multiple information source is required
(teachers, peers, nurses, patients, etc)
Skill Assessment Methods
 Direct observation of procedures (DOPs)
 Easily followed by feedback
 Lack of objectivity or reliability
 OSCE
 Reliability and objectivity is high
 Various skills assessed at one time
 Inducing students nervousness
 Demanding large number of examiners
OSCE (Objective Structured
Clinical Examination)
 Internationally, Prof. Harden started at
Univ of Dundee (1975)
 In Japan, Kawasaki Medical Univ.
started (1992)
 Before OSCE, clinical examination was
regarded as an assessment with low
objectivity
Medical Interview Station
 Standardized
patient
 Medical student
 Examiners
Physical Examination
Station
 Student
 Examiners
 Heart sound
simulator
Concept of OSCE
 Clinical skill assessment
 Structure
 Multiple stations including various clinical
aspects based on the blueprint
 Objectivity
 Trained multiple raters (examiners)
 Trained standardized patients
 Predetermined check sheet
Example Flow of Students
1st Stream 2nd Stream
Time Ce Ch Ab Ne Su Em Ce Ch Ab Ne Su Em
9:00-05 1 2 3 4 5 6 7 8 9 10 11 12
9:06-11 2 3 4 5 6 1 8 9 10 11 12 7
9:12-17 3 4 5 6 1 2 9 10 11 12 7 8
9:18-23 4 5 6 1 2 3 10 11 12 7 8 9
9:24-29 5 6 1 2 3 4 11 12 7 8 9 10
9:30-35 6 1 2 3 4 5 12 7 8 9 10 11
9:40-45 13 14 15 16 17 18 19 20 21 22 23 24
9:46-51 14 15 16 17 18 13 20 21 22 23 24 19
Medical
Interview
Time
9:00-10 51 52 53 54
9:11-21 55 56 57 58
9:22-32 59 60 61 62
9:33-43 63 64 65 66
9:44-54 67 68 69 70
1ststream
2ndstream
3rdstream
4thstream
Portfolio
 Similar to report but more than that.
 Contents of portfolio
 Review of patient case
 Collected information and evidences
 Plan for future learning
 Reflection on the case
 Portfolio learning with mentoring is more
important than assessment
Validity: Level of Clinical
Competence and Assessment
Knows
Knows How
Shows How
Does
Professionalauthenticity
Reliability,Cost
(modified from Miller GE. Acad Med
1990; 65: S63-7.)
Performance-Based ass.:
Videotaping, Audits…
Clinical tests:
Key Feature
problems…
Competence-Based
ass.:
OSCE, SP-based test…
Factual tests:
MCQs…
Portfolio Learning
 Portfolio: Literally, files of bunch of paper
 Clinical training:
 Each physician experiences different problems.
 Case summary plays an important role to
reflect.
 Balance between bio/psychosocial, ideal/real,
and self/others’ assessment.
 Electronic technology help search files of past
info and reflect deeper.
Portfolio Assessment
 Portfolio might be assessed by:
1. Content experts with rubric (criteria) of
different aspects
2. Viva voce (oral examination) by content
experts for both formative and summative
purposes
Mini-CEX
(Clinical Evaluation Exercise)
 American Board of Internal Medicine developed
CEX for residents in 1972
 CEX took 2 hours for each student
 Low reliability (Case specificity/Interrater variability)
 Mini-CEX uses multiple cases to maximize the
reliability
 The physician working with each case is assessed
by an expert
 Mini-CEX marked high reliability
 Many programs in UK decided to use Mini-CEX as
a required assessment
Form for Mini-CEX
RIME: Clinical Global Rating
 Method to assess practices in
clinical clerkship by global rating
scale.
 Synthetic assessment system for
integrated clinical works.
 RIME is an acronym for Reporter,
Interpreter, Manager, and
Educator.
What Does RIME Stand for?
 Reporter: “Mr. X has got fever.”
 Interpreter: “Mr. X has got fever. I think he has
pneumonia because…”
 Manager: “I think Mr. X has pneumococcal
pneumonia. May I order PCG?”
 Educator: “I think Mr. Y, student P’s patient has
pneumococcal pneumonia. Should he order PCG?”
* RIME is a handy, integrated, and reliable method to
assess clinical work.
Assess Clinical Reasoning
Ability by Case Presentation
 Selection of terms and reliability of clinical
examination
 Acquisition of presentation format
 Comprehension of the case and its
differential diagnoses
 List necessary pertinent negative and
positive signs/symptoms
 Relevant feedback for each level will be
determined
How to assess and improve case presentation by clinical reasoning level
Condition of
presentation
Levels of the
presenter
Feedback from the trainer to
the presenter
Vague
Insufficient or
unordered info
Inability to capture
each info or
organize info for the
case
Point out what is missing in
case presentation. Ask the
presenter to practice case
presentation
Structured
Essential info is
well covered but
DD are not well
listed
Able to report the
case but unable to
interpret the patient
problems
Give positive feedback for
complete info. Ask the
presenter to summarize the
presentation and DD
Organized
DD are covered but
pertinent positive
and negative S/S
are insufficient
During H&P no
relevant S/S to DD
obtained
Give positive feedback for DD
and specific feedback for
pertinent positive and
negative S/S
Pertinent
Pertinent positive
and negative S/S
relevant to DD are
covered
Through H&P whole
picture of the case
and its DD are
clearly described
Give positive feedback for a
good presentation. Ask the
presenter to specify the
lesson learned from the case
H&P: History & Physical, S/S: Signs and Symptoms, DD: Differential Diagnosis
3. Which Tool Is Better?
Good Assessment Tools
1. Consistency between objectives,
strategy and assessment
2. Motivate students
3. Reliability and validity
Consistency between Objectives,
Strategy and assessment
Objectives Model core curriculum
Strategy Clinical skill teaching
Assessment CAT-OSCE
CAT-OSCE
Clinical Clerkship
Clinical
Skill Teaching
Insufficient Preparation of
Clinical Skill Teaching
Objectives Model core curriculum
Strategy
Assessment CAT-OSCE
CAT-OSCE
Clinical ClerkshipClinical
Skill Teaching
Clinical skill teaching
Motivation of Students
 Anecdotally, students are enjoying in
preparing for CAT-OSCE
 If a number of students are failed,
nervousness of OSCE deteriorate the
climate of learning
 Preparatory schools published new textbooks
and videos and started seminars
Was OSCE Meaningful?
 Many physicians/teachers perceived:
 Not only surface knowledge but also
skills (or attitudes) should be assessed.
 Simulation teaching should be brought
in between theory and real setting.
 Some students noticed:
 Learning clinical skills is interesting.
 Knowing and showing how is different.
Issues to Consider
 Number of stations
 Contents to assess, time length,
reliability issue, facility (rooms)
 Development of SPs, examiners,
exam sheets, & examiners’ manuals
Theme for Discussion
 Think of any summative assessment
 List issues of the assessment system
 Suggest how to improve the system

03 Assessment issues

  • 1.
    Assessment Issues Hirotaka OnishiMD, MHPE Int’l Research Center for Medical Education, University of Tokyo
  • 2.
    Aims of “Assessment” Comparing with standard  Providing feedback  info for decision making  Motivating learners  Accountability for learners, educators and society
  • 3.
    Overview 1. Curriculum andassessment 2. Various assessment tools 3. Issues to consider
  • 4.
  • 5.
    Assessment in Curriculum DevelopmentProcess  6-step approach(Kern) 2. Targeted needs assessment 3. Goals and objectives 4. Educational strategy 5. Implementation 6. Assessment 1. Problem identification and general needs assessment ● ● ● ● ● ● 3. Goals and objectives 6. Assessment
  • 6.
    Relationship between Objectives andAssessment  Objectives writing  Specific & Measurable – Easy to assess  Categorize by taxonomy – Selecting educational and assessment methods Example: X : Students will be able to perform systemic neurological exam. O : By the end of neurology clerkship, 4th year student will be demonstrate relevant performance on predetermined 31-items neurological exam.
  • 7.
    Balanced Performance Overall Performance Problem SolvingAttitude Recall Level Knowledge Skill
  • 8.
    Difficult Areas toAssess  Knowledge is easy to assess.  Skill is somewhat difficult to assess.  Problem solving and attitude are difficult to assess: “Iceberg below the water” Ability easy to assess: Knowledge, Skill Ability difficult to assess
  • 9.
    Taxonomy & Assessment Method TaxonomyRecommended Assessment method Knowledge Multiple choice questions (MCQs), Short answer questions (SAQs) Problem- solving Essays, Oral exams, Script concordance test, Key feature problem Attitude Rating scale, Multi-source feedback (360 degree) Skill Skill assessment, OSCE Behavior Real setting observation, Chart audit
  • 10.
    Why Is TaxonomyUseful?  By classifying taxonomy, educators can understand:  Assessment for deep knowledge and attitude is more difficult than skills and assessment for skills is more difficult than knowledge assessment.  Assessment for surface knowledge is insufficient to solve problems
  • 11.
    Formative vs Summative Formative assessment  To identify where and how to improve during curriculum  Summative assessment  To make decision (e.g. pass/fail) at the end of curriculum Start of curriculum middle end Formative Summative Assessment Assessment
  • 12.
    Exam Result 95.0 90.0 85.0 80.0 75.0 70.0 65.0 60.0 55.0 50.0 Frequency 60 50 40 30 20 10 0High reliabilityis needed to compare student scores Is Reliability Important?  Suppose two marks  Student A: 85%  Student B: 80%  Can we conclude A is more competent than B?
  • 13.
    High-stake test  Itsassessment result is used for critical assessment decision making  Graduation, License, Certificate…  Highest reliability is demanded  Validity is sometimes neglected… Ex) Japanese Medical License Exam consists of MCQs but does not include skill assessment.
  • 14.
    Reliability & Validity Reliability: Stability of measurement  Validity: Degree to measure what should be measured (content, construct, criterion) Reliable but not valid Not reliable but directed to the target on average
  • 15.
    To Assess ClinicalWork Reliable but not valid MCQs for Graduation or National Licensure Exam Peer/Other Health Staff/Pt’s assessment Not reliable but directed to the target Attitude Knowledge Skill Attitude Knowledge Skill
  • 16.
  • 17.
    Validity: Level ofClinical Competence and Assessment Knows Knows How Shows How Does (Miller GE. Acad Med 1990; 65: S63-7.) Performance-Based ass. Mini-CEX, DOP, MSF… Clinical tests: MEQs, Key Feature Problems… Competence-Based ass. OSCE, SP-based test… Factual tests: MCQs… Professionalauthenticity Reliability,Cost
  • 18.
    Comparison for Knowledge AssessmentMethods  MCQ  Higher reliability and objectivity  Difficult to develop good questions  Induce surface learning and test-wiseness  SAQ (short answer questions)  Deeper knowledge is needed  Examiners must mark papers  Subjectivity of marking
  • 19.
    Comparison for DeepKnowledge Assessment Methods  Essays  Systematic for questions and marking  May repeat reviewing process of marking  Oral exams  Examiners can adjust how they ask questions for each examinee  Difficult to monitor/review the processes  KFPs (Key feature problems)  Difficult to produce  High reliability
  • 20.
    Extended-Matching Items (R-type) The followingset includes two item stems. The first requires that the examinee synthesise information in order to determine a diagnosis; the second requires only recall of an isolated fact. A. Vit A B. Vit B1 C. Vit B2 D. Vit B6 E. Vit C F. Vit D G. Vit E H. Vit K I. Biotin J. Copper K. Folate L. Iodine M. Iron N. Magnesium O. Niacin P. Zinc For each pt. with clinical features caused by metabolic abnormalities, select the vitamin or mineral that is most likely to be involved. 1. A 70yo widower has ecchymoses, perifollicular paetechiae, and swelling of the gingiva. His diet consists mostly cola and hot dogs. 2. Involved in clotting factor synthesis. (Q1: E, Q2: H)
  • 21.
    Attitude Assessment Method  Multi-sourcefeedback  Scales are impression-based  If contact with assessor is low, assessment becomes difficult  Multiple information source is required (teachers, peers, nurses, patients, etc)
  • 22.
    Skill Assessment Methods Direct observation of procedures (DOPs)  Easily followed by feedback  Lack of objectivity or reliability  OSCE  Reliability and objectivity is high  Various skills assessed at one time  Inducing students nervousness  Demanding large number of examiners
  • 23.
    OSCE (Objective Structured ClinicalExamination)  Internationally, Prof. Harden started at Univ of Dundee (1975)  In Japan, Kawasaki Medical Univ. started (1992)  Before OSCE, clinical examination was regarded as an assessment with low objectivity
  • 24.
    Medical Interview Station Standardized patient  Medical student  Examiners
  • 25.
    Physical Examination Station  Student Examiners  Heart sound simulator
  • 26.
    Concept of OSCE Clinical skill assessment  Structure  Multiple stations including various clinical aspects based on the blueprint  Objectivity  Trained multiple raters (examiners)  Trained standardized patients  Predetermined check sheet
  • 27.
    Example Flow ofStudents 1st Stream 2nd Stream Time Ce Ch Ab Ne Su Em Ce Ch Ab Ne Su Em 9:00-05 1 2 3 4 5 6 7 8 9 10 11 12 9:06-11 2 3 4 5 6 1 8 9 10 11 12 7 9:12-17 3 4 5 6 1 2 9 10 11 12 7 8 9:18-23 4 5 6 1 2 3 10 11 12 7 8 9 9:24-29 5 6 1 2 3 4 11 12 7 8 9 10 9:30-35 6 1 2 3 4 5 12 7 8 9 10 11 9:40-45 13 14 15 16 17 18 19 20 21 22 23 24 9:46-51 14 15 16 17 18 13 20 21 22 23 24 19 Medical Interview Time 9:00-10 51 52 53 54 9:11-21 55 56 57 58 9:22-32 59 60 61 62 9:33-43 63 64 65 66 9:44-54 67 68 69 70 1ststream 2ndstream 3rdstream 4thstream
  • 28.
    Portfolio  Similar toreport but more than that.  Contents of portfolio  Review of patient case  Collected information and evidences  Plan for future learning  Reflection on the case  Portfolio learning with mentoring is more important than assessment
  • 29.
    Validity: Level ofClinical Competence and Assessment Knows Knows How Shows How Does Professionalauthenticity Reliability,Cost (modified from Miller GE. Acad Med 1990; 65: S63-7.) Performance-Based ass.: Videotaping, Audits… Clinical tests: Key Feature problems… Competence-Based ass.: OSCE, SP-based test… Factual tests: MCQs…
  • 30.
    Portfolio Learning  Portfolio:Literally, files of bunch of paper  Clinical training:  Each physician experiences different problems.  Case summary plays an important role to reflect.  Balance between bio/psychosocial, ideal/real, and self/others’ assessment.  Electronic technology help search files of past info and reflect deeper.
  • 31.
    Portfolio Assessment  Portfoliomight be assessed by: 1. Content experts with rubric (criteria) of different aspects 2. Viva voce (oral examination) by content experts for both formative and summative purposes
  • 32.
    Mini-CEX (Clinical Evaluation Exercise) American Board of Internal Medicine developed CEX for residents in 1972  CEX took 2 hours for each student  Low reliability (Case specificity/Interrater variability)  Mini-CEX uses multiple cases to maximize the reliability  The physician working with each case is assessed by an expert  Mini-CEX marked high reliability  Many programs in UK decided to use Mini-CEX as a required assessment
  • 33.
  • 34.
    RIME: Clinical GlobalRating  Method to assess practices in clinical clerkship by global rating scale.  Synthetic assessment system for integrated clinical works.  RIME is an acronym for Reporter, Interpreter, Manager, and Educator.
  • 35.
    What Does RIMEStand for?  Reporter: “Mr. X has got fever.”  Interpreter: “Mr. X has got fever. I think he has pneumonia because…”  Manager: “I think Mr. X has pneumococcal pneumonia. May I order PCG?”  Educator: “I think Mr. Y, student P’s patient has pneumococcal pneumonia. Should he order PCG?” * RIME is a handy, integrated, and reliable method to assess clinical work.
  • 36.
    Assess Clinical Reasoning Abilityby Case Presentation  Selection of terms and reliability of clinical examination  Acquisition of presentation format  Comprehension of the case and its differential diagnoses  List necessary pertinent negative and positive signs/symptoms  Relevant feedback for each level will be determined
  • 37.
    How to assessand improve case presentation by clinical reasoning level Condition of presentation Levels of the presenter Feedback from the trainer to the presenter Vague Insufficient or unordered info Inability to capture each info or organize info for the case Point out what is missing in case presentation. Ask the presenter to practice case presentation Structured Essential info is well covered but DD are not well listed Able to report the case but unable to interpret the patient problems Give positive feedback for complete info. Ask the presenter to summarize the presentation and DD Organized DD are covered but pertinent positive and negative S/S are insufficient During H&P no relevant S/S to DD obtained Give positive feedback for DD and specific feedback for pertinent positive and negative S/S Pertinent Pertinent positive and negative S/S relevant to DD are covered Through H&P whole picture of the case and its DD are clearly described Give positive feedback for a good presentation. Ask the presenter to specify the lesson learned from the case H&P: History & Physical, S/S: Signs and Symptoms, DD: Differential Diagnosis
  • 38.
    3. Which ToolIs Better?
  • 39.
    Good Assessment Tools 1.Consistency between objectives, strategy and assessment 2. Motivate students 3. Reliability and validity
  • 40.
    Consistency between Objectives, Strategyand assessment Objectives Model core curriculum Strategy Clinical skill teaching Assessment CAT-OSCE CAT-OSCE Clinical Clerkship Clinical Skill Teaching
  • 41.
    Insufficient Preparation of ClinicalSkill Teaching Objectives Model core curriculum Strategy Assessment CAT-OSCE CAT-OSCE Clinical ClerkshipClinical Skill Teaching Clinical skill teaching
  • 42.
    Motivation of Students Anecdotally, students are enjoying in preparing for CAT-OSCE  If a number of students are failed, nervousness of OSCE deteriorate the climate of learning  Preparatory schools published new textbooks and videos and started seminars
  • 43.
    Was OSCE Meaningful? Many physicians/teachers perceived:  Not only surface knowledge but also skills (or attitudes) should be assessed.  Simulation teaching should be brought in between theory and real setting.  Some students noticed:  Learning clinical skills is interesting.  Knowing and showing how is different.
  • 44.
    Issues to Consider Number of stations  Contents to assess, time length, reliability issue, facility (rooms)  Development of SPs, examiners, exam sheets, & examiners’ manuals
  • 45.
    Theme for Discussion Think of any summative assessment  List issues of the assessment system  Suggest how to improve the system