Analysing medical performance evaluation data for relicensure purposes

634 views

Published on

Ajit Narayanan
Auckland University of Technology
(Wednesday, 10.15, Data Analysis Workshop)

Published in: Health & Medicine
0 Comments
1 Like
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total views
634
On SlideShare
0
From Embeds
0
Number of Embeds
2
Actions
Shares
0
Downloads
0
Comments
0
Likes
1
Embeds 0
No embeds

No notes for slide

Analysing medical performance evaluation data for relicensure purposes

  1. 1. Analysing medical performance evaluation data for relicensure purposes Ajit NarayananSchool of Computing and Mathematical Sciences Auckland University of Technology 1
  2. 2. Background• In 1998 the General Medical Council (GMC), which registers and regulates doctors practising in the United Kingdom, determined that “all doctors should be prepared to demonstrate at regular intervals that they remain up to date and fit to practise”• Shortly afterwards, GMC proposed that participation in such a process should become a condition of continued registration (“relicensure”/revalidation) of 200,000 doctors in the UK• GMC attracted by the use of questionnaires 2 completed by patients and colleagues as a
  3. 3. Overview of project• Limited published evidence available regarding the reliability, validity and effectiveness of relicensure processes in the medical domain• The overall aim was to conduct a large scale survey of doctors undertaking multi-source feedback (MSF) using the GMC patient and colleague questionnaires• Between 1999 and 2003, GMC investigated various questionnaires („tools‟) for use in MSF• Preliminary work undertaken by Leeds University Medical Education Unit (Sue Kilminster, Godfrey Pell, Trudie Roberts: „Patient and Colleague Questionnaires: Validation Report to the GMC,‟ May 2005) 3
  4. 4. Objectives of feasibility project (2005-2011)• In 2005, GMC commissioned the Peninsula Medical School and an independent survey company, CFEP, to trial the tools with doctors in general practice and then more widely across different specialties.• Are the MSF tools (patient questionnaire, colleague questionnaire) fit for purpose?• Do the tools provide a first level 4
  5. 5. Specific objectives• What are statistical properties of questionnaires in terms of reliability and validity?• What are operational issues involved in collecting patient and colleague data?• Once we have the data, how can we use it to help identify doctors for further scrutiny?• Overall, GMC/PMS/CFEP project deals with: • how to collect the data • how to analyse the data• My role was one of the statistical consultants to the 5 GMC/PMS/CFEP project
  6. 6. Less than Don‟t Poor satisfactory Satisfactory Good Very good know 5 point Lickert1 Clinical knowledge      2 Diagnosis       scale Colleague3 Clinical decision making       questionnaire4 Treatment      5 (including practical procedures) Prescribing       questions6 Medical record keeping      7 Recognising and working within limitations       14 core questions8 Keeping knowledge and skills up to date      9 Reviewing and reflecting on own performance       on specific10 Teaching (students, trainees, others)       aspects of11 Supervising colleagues      12 Commitment to care and wellbeing of patients       professionalism13 Communication with patients and relatives      14 Working effectively with colleagues       4 global15 Effective time management       Strongly Disagree Neutral Agree Strongly Don‟t assessments disagree agree know16 I am confident that this doctor respects patient confidentiality      17 I am confident that this doctor is honest and trustworthy       1 summative I am confident that this doctor‟s18 performance is not impaired by ill health       question (binary)19 I am confident that this doctor is fit to practise medicine  Yes  No  Don‟t know 6
  7. 7. 4 How good was your doctor today at each of the following? (Please tick one box in each line) Less than Does not Poor Satisfactory Good Very good satisfactory apply a Being polite       Patient b Making you feel at ease       questionnair c Listening to you       e questions d Assessing your medical condition       e Explaining your condition and treatment       f Involving you in decisions about your treatment       7 core g Providing or arranging treatment for you       questions on 5 Please decide how strongly you agree or disagree with the following statements by ticking one box in each line. professionalis Strongl y Disagr Neutral Agree Strongl y Does not m disagre ee apply agree e I am confident that this doctor a will keep information about me confidential       2 global b I am confident that this doctor is honest and trustworthy       assessments 2 summative6 I am confident about this doctor’s ability to provide care  Yes  No assessments (binary) 77 I would be completely happy to see this doctor again  Yes  No
  8. 8. Survey methods 1 (3rd cycle)• Doctors from eleven sites in England and Wales took part in the survey between Spring 2008 and September 2010.• These included four acute hospital trusts, one mental health trust, four primary care organisations and one independent sector (non-NHS) organisation• Also, an anaesthetics department at a university hospital NHS Trust contributed to the main survey work. 8
  9. 9. Survey methods 2• For most doctors, clinic receptionists or supporting administrative staff were asked to distribute a PQ pack to 45 consecutive patients (or carers) who are consulting with the doctor.• Doctors were requested to complete and return the contact details (whenever possible including emails) for 20 colleagues who were able to comment on their practice.• Normally, approximately half of those nominated should be medical colleagues and the remainder non-medical colleagues (e.g. nurses, allied health professionals, administrative or managerial staff). 9
  10. 10. Third cycle (2010)• 1065 doctors participated in both PQ and CQ• 908 doctors returned 22 or more PQ responses (29284 PQs, mean 32.3 PQs per doctor, median 36)• 1050 doctors returned 8 or more CQ responses (17012 CQs, mean 16.2 CQ per doctor, median 17).• 751 doctors provided sufficient returns 10 on both CQ and PQ
  11. 11. Reliability• Cronbach α = 0.94 for CQ• Cronbach α = 0.896 for PQ• Other measures indicate that questionnaires are highly reliable in that respondents agreed on how to interpret the items and how to use the scales to assign ratings to subjects 11
  12. 12. Results for PQLeft table: Unaggregated (raw patient scores)Right table: Aggregated (patient scores when aggregated 12 bydoctor they are responding to)
  13. 13. Results for CQ 13
  14. 14. Problem•The mean scores for doctors (aggregated level) arevery high•How does one identify potential under-performersgiven the high ratings provided by raters?•Since there are so few doctors who receive an adverserating on the summative items of CQ and PQ, the taskis to find patterns in the aggregated patient andcolleague scores that identify doctors for possiblefurther scrutiny and separate such doctors from thosewho do not require further scrutiny.•Also, it may be important to identify doctors whose 14performance does not warrant placing them in the
  15. 15. Standards-based approach• Even one standard deviation from the mean can result in a score above the maximum possible (e.g. mean of 4.85 with standard deviation of ±0.2 on a scale 1-5), so what is the meaning of standard deviation in this context?• Also, falling three standard deviations below the mean may result in a doctor still obtaining a score that means ‘good’ (e.g. average 4.85 – 3*0.2=4.25).• Data normalisation may lead to the accusation that, if the questionnaires are highly reliable statistically, data is being massaged for the political purpose of identifying doctors for further scrutiny when, in fact, the original scores indicate no cause for concern.• Z-scores are representations of raw scores in terms of standard deviations from the mean 15
  16. 16. Z-scoresID item1 Item2 Item3 Item4 Item5 zitem1 zitem2 zitem3 zitem4 zitem5 below-1.96stds below -1std 1 3.78 3.50 3.67 3.78 3.60 -2.16 -2.26 -1.65 -1.27 -0.83 2 4 2 4.38 4.25 3.88 3.57 4.43 -0.31 -0.34 -1.12 -1.72 0.40 0 2 3 4.40 4.40 4.50 5.00 4.40 -0.23 0.04 0.46 1.36 0.35 0 0 4 4.58 4.56 4.53 4.50 4.44 0.32 0.46 0.54 0.28 0.41 0 0 5 4.79 4.63 4.59 4.33 4.71 0.97 0.61 0.69 -0.08 0.81 0 0 6 4.39 4.56 4.33 4.43 4.53 -0.27 0.46 0.04 0.13 0.55 0 0 7 4.75 4.75 4.75 4.60 4.64 0.85 0.93 1.10 0.50 0.70 0 0 8 4.42 4.08 3.93 4.00 3.92 -0.18 -0.79 -0.99 -0.79 -0.36 0 0 9 4.33 4.27 4.17 4.54 4.46 -0.44 -0.29 -0.38 0.37 0.44 0 010 4.94 4.85 4.83 4.93 2.50 1.45 1.18 1.31 1.22 -2.47 1 1Synthetic database of 10 doctors with aggregated meansacross 5 items (item1-item5), together with standardised zscores for these items (zitem1-zitem5, where z representsthe standard deviation from the mean for that item). Thefinal two columns indicate the number of items below −1.96standard deviations and below minus one standarddeviation from the mean, respectively. The original rawscores of raters (Likert scale range 1-5) are not shown 16
  17. 17. Cluster analysis• Cluster analysis explores and mines data with the purpose of categorising different samples into groups (clusters) such that the degree of association between two samples is maximal if they belong to the same cluster and minimal otherwise. 17
  18. 18. Meaning of clusters• Ideally, all cases within a cluster have maximum similarity while cases across different clusters have a high degree of dissimilarity• Cases within a cluster have more in common with each other than they do with cases in other clusters. 18
  19. 19. Simple clustering example Gene 1 Gene 2 Gene 3 Gene 4 Gene 5 0 1 0 0 0 Imagine that we havePatient1 4 patients and their measurement on five 0 0 1 1 1 genes. Are there anyPatient2 natural groupings 1 1 0 0 1 among these patientsPatient depending on their3 gene profiles? 0 0 1 1 0Patient4 19
  20. 20. Step 1: calculate pairwise coefficients – Workings P1/P2: 1+0+0+0+0=1/5=0.2 P1/P3: 0+1+1+1+0=3/5=0.6 Gene Gene Gene Gene Gene 1 2 3 4 5 P1/P4: 1+0+0+0+1=2/5=0.4 P2/P3: 0+0+0+0+1=1/5=0.2 0 1 0 0 0Patien P2/P4: 1+1+1+1+0=4/5=0.8 (ranked first in this step)t1 P3/P4: 0+0+0+0+0=0.0 0 0 1 1 1Patien Step 2: calculate pairwise coefficients, using P2+P4 as at2 „superpatient‟ – 1 1 0 0 1 P1/P2+P4: 1+0+0+0+0.5=1.5/5=0.3Patient3 P3/P2+P4: 0+0+0+0+0.5=0.5/5=0.1 0 0 1 1 0 P1/P3 = 0.6 (as before) (ranked first in this step)Patient4 Step 3: calculate pairwise coefficients, using P2+P4 as one superpatient and P1+P3 as the second superpatient – P1+P3/P2+P4: 0.5+0+0+0+0.5= 1/5=0.2 (final step)0.0 Cluster dendogram0.2 That is, two natural groupings0.6 occur in the data, with P2 and P4 forming one tight group0.8 and P1 and P3 forming1.0 another (looser) group. 20 P2 P4 P1 P3
  21. 21. Hierarchical cluster analysis• HCA (agglomerative) clustering first assigns each case to its own cluster, followed by an iterative process whereby the two most similar clusters form a new cluster until one overall cluster results.• Clusters that are added to each other can consist of single cases or multiple cases.• The output is in the form of a taxonomy or hierarchical tree („dendogram‟).• Cases of increasing dissimilarity are aggregated at various levels of the tree using a rescaled metric (typically ranging from 1-25). 21
  22. 22. Cluster dendogram for synthetic dataTree indicates that cases5-7 and 4-6-9 have morein common with eachother than with any other.Case 1 is a clear „outlier‟in that it is clustered last.Case 10 is also an outlier,but not so much as Case13 natural groupings plus 22
  23. 23. ID item1 Item2 Item3 Item4 Item5 zitem1 zitem2 zitem3 zitem4 zitem5 below-1.96stds below -1std 1 3.78 3.50 3.67 3.78 3.60 -2.16 -2.26 -1.65 -1.27 -0.83 2 4 2 4.38 4.25 3.88 3.57 4.43 -0.31 -0.34 -1.12 -1.72 0.40 0 2 3 4.40 4.40 4.50 5.00 4.40 -0.23 0.04 0.46 1.36 0.35 0 0 4 4.58 4.56 4.53 4.50 4.44 0.32 0.46 0.54 0.28 0.41 0 0 5 4.79 4.63 4.59 4.33 4.71 0.97 0.61 0.69 -0.08 0.81 0 0 6 4.39 4.56 4.33 4.43 4.53 -0.27 0.46 0.04 0.13 0.55 0 0 7 4.75 4.75 4.75 4.60 4.64 0.85 0.93 1.10 0.50 0.70 0 0 8 4.42 4.08 3.93 4.00 3.92 -0.18 -0.79 -0.99 -0.79 -0.36 0 0 9 4.33 4.27 4.17 4.54 4.46 -0.44 -0.29 -0.38 0.37 0.44 0 010 4.94 4.85 4.83 4.93 2.50 1.45 1.18 1.31 1.22 -2.47 1 1 23
  24. 24. Application to CQ and PQ• The aim here is to cluster satisfactory doctors in a group, or in groups, that are separate from the group, or groups, of underperforming doctors based on similarity and dissimilarity measures calculated from their scores on performative questionnaire items (18 for CQ, 9 for PQ, 27 when combined). 24
  25. 25. 9 performance items from PQLeft: full cluster dendogram for 908 doctorsusing PQ data.Right: expansion of bottom part of treeidentifying potentially under-performing doctors, 25according to patients
  26. 26. 18 performance items from CQLeft: full cluster dendogram for 1050 doctors usingCQ data.Right: expansion of bottom part of tree identifyingpotentially under-performing doctors, according to 26colleagues
  27. 27. 27 performanc e items from both PQ and CQLeft: Full cluster diagram for 751 doctors usingboth CQ and PQ data.Right: expansion of bottom part of treeidentifying potential under-performing doctors, 27according to both patients and colleagues
  28. 28. Conclusions• Both the GMC patient and colleague questionnaires represent instruments which would provide a reasonable basis for the collation of evidence regarding a doctor‟s professional performance, according to our reliability analysis so far.• Raters currently are very reluctant to give adverse ratings using the summative items.• Other methods must be found that can tease out of the data any concerns that raters have. 28
  29. 29. Conclusions• Even if a doctor is ranked bottom (irrespective of ranking method used), we must be careful to interpret MSF results in the context of the doctor‟s setting and specialty.• There is no absolute threshold of performance. Instead, the identification of doctors for potential further scrutiny should be supported by other evidence of performance, given the financial, personal and professional implications.• Several medical councils have been following 29
  30. 30. AcknowledgementsProfessor John Campbell (PMS*, Academic Lead)Dr Suzanne Richards (Academic Project Manager, PMS)Mr Andy Dickens (Research Fellow, PMS)Associate Professor Michael Greco (Service Development Lead, CFEP**)Ms Jacqueline Hill (Research Fellow, PMS)Dr Jeremy Hobart (Reader, PMS)Professor Geoff Norman (Consultant)Mr Martin Roberts (Statistician, Research Fellow, PMS)Dr Christine Wright (Research Fellow, PMS)*PMS: Peninsula Medical School at the Universities of Exeter and Plymouth., UK. Now called Peninsula College of Medicine and Dentistry.**CFEP: Based at the Innovation Centre, University of Exeter, and in Brisbane, Australia. 30

×