03 Assessment issues

Assessment Issues
Hirotaka Onishi MD, MHPE
Int’l Research Center for Medical
Education, University of Tokyo

Aims of “Assessment”
 Comparing with standard
 Providing feedback
 info for decision making
 Motivating learners
 Accountability for learners,
educators and society

Overview
1. Curriculum and assessment
2. Various assessment tools
3. Issues to consider

Assessment in Curriculum
Development Process
 6-step approach（Kern）
2. Targeted needs
assessment
3. Goals and objectives
4. Educational strategy
5. Implementation
6. Assessment
1. Problem identification and
general needs assessment
●
●
●
●
●
●
3. Goals and objectives
6. Assessment

Relationship between
Objectives and Assessment
 Objectives writing
 Specific & Measurable – Easy to assess
 Categorize by taxonomy – Selecting
educational and assessment methods
Example:
X : Students will be able to perform systemic
neurological exam.
O : By the end of neurology clerkship, 4th year
student will be demonstrate relevant performance
on predetermined 31-items neurological exam.

Balanced Performance
Overall Performance
Problem
Solving Attitude
Recall Level
Knowledge
Skill

Difficult Areas to Assess
 Knowledge is easy to assess.
 Skill is somewhat difficult to assess.
 Problem solving and attitude are
difficult to assess: “Iceberg below the
water”
Ability easy to assess:
Knowledge, Skill
Ability difficult to
assess

Taxonomy &
Assessment Method
Taxonomy Recommended Assessment method
Knowledge Multiple choice questions (MCQs), Short
answer questions (SAQs)
Problem-
solving
Essays, Oral exams, Script concordance
test, Key feature problem
Attitude Rating scale, Multi-source feedback (360
degree)
Skill Skill assessment, OSCE
Behavior Real setting observation, Chart audit

Why Is Taxonomy Useful?
 By classifying taxonomy, educators
can understand:
 Assessment for deep knowledge and
attitude is more difficult than skills and
assessment for skills is more difficult
than knowledge assessment.
 Assessment for surface knowledge is
insufficient to solve problems

Formative vs Summative
 Formative assessment
 To identify where and how to improve
during curriculum
 Summative assessment
 To make decision (e.g. pass/fail) at the
end of curriculum
Start of curriculum middle end
Formative Summative
Assessment Assessment

Exam Result
95.0
90.0
85.0
80.0
75.0
70.0
65.0
60.0
55.0
50.0
Frequency
60
50
40
30
20
10
0High reliability is
needed to compare
student scores
Is Reliability
Important?
 Suppose two marks
 Student A: 85%
 Student B: 80%
 Can we conclude A
is more competent
than B?

High-stake test
 Its assessment result is used for
critical assessment decision making
 Graduation, License, Certificate…
 Highest reliability is demanded
 Validity is sometimes neglected…
Ex) Japanese Medical License Exam consists of
MCQs but does not include skill assessment.

Reliability & Validity
 Reliability: Stability of measurement
 Validity: Degree to measure what should be
measured (content, construct, criterion)
Reliable but not
valid
Not reliable but
directed to the target
on average

To Assess Clinical Work
Reliable but
not valid
MCQs for Graduation or
National Licensure Exam
Peer/Other Health
Staff/Pt’s assessment
Not reliable but
directed to the target
Attitude Knowledge
Skill
Attitude
Knowledge
Skill

Validity: Level of Clinical
Competence and Assessment
Knows
Knows How
Shows
How
Does
（Miller GE. Acad Med 1990; 65: S63-7.）
Performance-Based ass.
Mini-CEX, DOP, MSF…
Clinical tests: MEQs,
Key Feature Problems…
Competence-Based ass.
OSCE, SP-based test…
Factual tests: MCQs…
Professionalauthenticity
Reliability,Cost

Comparison for Knowledge
Assessment Methods
 MCQ
 Higher reliability and objectivity
 Difficult to develop good questions
 Induce surface learning and test-wiseness
 SAQ (short answer questions)
 Deeper knowledge is needed
 Examiners must mark papers
 Subjectivity of marking

Comparison for Deep Knowledge
Assessment Methods
 Essays
 Systematic for questions and marking
 May repeat reviewing process of marking
 Oral exams
 Examiners can adjust how they ask questions for
each examinee
 Difficult to monitor/review the processes
 KFPs (Key feature problems)
 Difficult to produce
 High reliability

Extended-Matching Items
(R-type)
The following set includes two item stems. The first requires that the
examinee synthesise information in order to determine a diagnosis;
the second requires only recall of an isolated fact.
A. Vit A B. Vit B1 C. Vit B2 D. Vit B6
E. Vit C F. Vit D G. Vit E H. Vit K
I. Biotin J. Copper K. Folate L. Iodine
M. Iron N. Magnesium O. Niacin P. Zinc
For each pt. with clinical features caused by metabolic abnormalities,
select the vitamin or mineral that is most likely to be involved.
1. A 70yo widower has ecchymoses, perifollicular paetechiae, and
swelling of the gingiva. His diet consists mostly cola and hot dogs.
2. Involved in clotting factor synthesis.
(Q1: E, Q2: H)

Attitude Assessment
Method
 Multi-source feedback
 Scales are impression-based
 If contact with assessor is low,
assessment becomes difficult
 Multiple information source is required
(teachers, peers, nurses, patients, etc)

Skill Assessment Methods
 Direct observation of procedures (DOPs)
 Easily followed by feedback
 Lack of objectivity or reliability
 OSCE
 Reliability and objectivity is high
 Various skills assessed at one time
 Inducing students nervousness
 Demanding large number of examiners

OSCE (Objective Structured
Clinical Examination)
 Internationally, Prof. Harden started at
Univ of Dundee (1975)
 In Japan, Kawasaki Medical Univ.
started (1992)
 Before OSCE, clinical examination was
regarded as an assessment with low
objectivity

Medical Interview Station
 Standardized
patient
 Medical student
 Examiners

Physical Examination
Station
 Student
 Examiners
 Heart sound
simulator

Concept of OSCE
 Clinical skill assessment
 Structure
 Multiple stations including various clinical
aspects based on the blueprint
 Objectivity
 Trained multiple raters (examiners)
 Trained standardized patients
 Predetermined check sheet

Example Flow of Students
1st Stream 2nd Stream
Time Ce Ch Ab Ne Su Em Ce Ch Ab Ne Su Em
9:00-05 1 2 3 4 5 6 7 8 9 10 11 12
9:06-11 2 3 4 5 6 1 8 9 10 11 12 7
9:12-17 3 4 5 6 1 2 9 10 11 12 7 8
9:18-23 4 5 6 1 2 3 10 11 12 7 8 9
9:24-29 5 6 1 2 3 4 11 12 7 8 9 10
9:30-35 6 1 2 3 4 5 12 7 8 9 10 11
9:40-45 13 14 15 16 17 18 19 20 21 22 23 24
9:46-51 14 15 16 17 18 13 20 21 22 23 24 19
Medical
Interview
Time
9:00-10 51 52 53 54
9:11-21 55 56 57 58
9:22-32 59 60 61 62
9:33-43 63 64 65 66
9:44-54 67 68 69 70
1ststream
2ndstream
3rdstream
4thstream

Portfolio
 Similar to report but more than that.
 Contents of portfolio
 Review of patient case
 Collected information and evidences
 Plan for future learning
 Reflection on the case
 Portfolio learning with mentoring is more
important than assessment

Validity: Level of Clinical
Competence and Assessment
Knows
Knows How
Shows How
Does
Professionalauthenticity
Reliability,Cost
（modified from Miller GE. Acad Med
1990; 65: S63-7.）
Performance-Based ass.:
Videotaping, Audits…
Clinical tests:
Key Feature
problems…
Competence-Based
ass.:
OSCE, SP-based test…
Factual tests:
MCQs…

Portfolio Learning
 Portfolio: Literally, files of bunch of paper
 Clinical training:
 Each physician experiences different problems.
 Case summary plays an important role to
reflect.
 Balance between bio/psychosocial, ideal/real,
and self/others’ assessment.
 Electronic technology help search files of past
info and reflect deeper.

Portfolio Assessment
 Portfolio might be assessed by:
1. Content experts with rubric (criteria) of
different aspects
2. Viva voce (oral examination) by content
experts for both formative and summative
purposes

Mini-CEX
(Clinical Evaluation Exercise)
 American Board of Internal Medicine developed
CEX for residents in 1972
 CEX took 2 hours for each student
 Low reliability (Case specificity/Interrater variability)
 Mini-CEX uses multiple cases to maximize the
reliability
 The physician working with each case is assessed
by an expert
 Mini-CEX marked high reliability
 Many programs in UK decided to use Mini-CEX as
a required assessment

RIME: Clinical Global Rating
 Method to assess practices in
clinical clerkship by global rating
scale.
 Synthetic assessment system for
integrated clinical works.
 RIME is an acronym for Reporter,
Interpreter, Manager, and
Educator.

What Does RIME Stand for?
 Reporter: “Mr. X has got fever.”
 Interpreter: “Mr. X has got fever. I think he has
pneumonia because…”
 Manager: “I think Mr. X has pneumococcal
pneumonia. May I order PCG?”
 Educator: “I think Mr. Y, student P’s patient has
pneumococcal pneumonia. Should he order PCG?”
* RIME is a handy, integrated, and reliable method to
assess clinical work.

Assess Clinical Reasoning
Ability by Case Presentation
 Selection of terms and reliability of clinical
examination
 Acquisition of presentation format
 Comprehension of the case and its
differential diagnoses
 List necessary pertinent negative and
positive signs/symptoms
 Relevant feedback for each level will be
determined

How to assess and improve case presentation by clinical reasoning level
Condition of
presentation
Levels of the
presenter
Feedback from the trainer to
the presenter
Vague
Insufficient or
unordered info
Inability to capture
each info or
organize info for the
case
Point out what is missing in
case presentation. Ask the
presenter to practice case
presentation
Structured
Essential info is
well covered but
DD are not well
listed
Able to report the
case but unable to
interpret the patient
problems
Give positive feedback for
complete info. Ask the
presenter to summarize the
presentation and DD
Organized
DD are covered but
pertinent positive
and negative S/S
are insufficient
During H&P no
relevant S/S to DD
obtained
Give positive feedback for DD
and specific feedback for
pertinent positive and
negative S/S
Pertinent
Pertinent positive
and negative S/S
relevant to DD are
covered
Through H&P whole
picture of the case
and its DD are
clearly described
Give positive feedback for a
good presentation. Ask the
presenter to specify the
lesson learned from the case
H&P: History & Physical, S/S: Signs and Symptoms, DD: Differential Diagnosis

Good Assessment Tools
1. Consistency between objectives,
strategy and assessment
2. Motivate students
3. Reliability and validity

Consistency between Objectives,
Strategy and assessment
Objectives Model core curriculum
Strategy Clinical skill teaching
Assessment CAT-OSCE
CAT-OSCE
Clinical Clerkship
Clinical
Skill Teaching

Insufficient Preparation of
Clinical Skill Teaching
Objectives Model core curriculum
Strategy
Assessment CAT-OSCE
CAT-OSCE
Clinical ClerkshipClinical
Skill Teaching
Clinical skill teaching

Motivation of Students
 Anecdotally, students are enjoying in
preparing for CAT-OSCE
 If a number of students are failed,
nervousness of OSCE deteriorate the
climate of learning
 Preparatory schools published new textbooks
and videos and started seminars

Was OSCE Meaningful?
 Many physicians/teachers perceived:
 Not only surface knowledge but also
skills (or attitudes) should be assessed.
 Simulation teaching should be brought
in between theory and real setting.
 Some students noticed:
 Learning clinical skills is interesting.
 Knowing and showing how is different.

Issues to Consider
 Number of stations
 Contents to assess, time length,
reliability issue, facility (rooms)
 Development of SPs, examiners,
exam sheets, & examiners’ manuals

Theme for Discussion
 Think of any summative assessment
 List issues of the assessment system
 Suggest how to improve the system

03 Assessment issues

More Related Content

What's hot

Similar to 03 Assessment issues

More from 東京大学医学系研究科医学教育国際研究センター

Recently uploaded

03 Assessment issues