- 1. MRCPsych Teaching 2009 MRCPsych 2009 Critical Appraisal of Diagnostic Tests Studies of Accuracy, Validity, Screening & Case finding Alex J Mitchell Consultant in Liaison Psychiatry University of Leicester
- 2. Contents MRCPsych 2009 1. Importance of understanding diagnostic tests 2. Concept of diagnostic tests: traits to diseases 3. Statistics of diagnostic tests 4. Clinical Value of diagnostic tests 5. Worked examples 6. Advances techniques
- 3. 1. Importance of understanding diagnostic tests
- 4. What Is a Diagnostic Test in Psychiatry? MRCPsych 2009 • CT/MRI • CSF • Blood tests eg TFTs • SCAN/SCID/PSE/MINI • Neuropsychological Testing • MMSE • HADS/BDI/CESD? • Clinical Judgement • Self-report
- 5. Why Is a HADS score not a diagnosis? MRCPsych 2009
- 6. Why Is a HADS score not a diagnosis? MRCPsych 2009 1. No core features 2. No symptom ranking 3. No functional assessment 4. Duration unclear 5. What if Missing items? 6. Imprecise
- 7. Defining Diagnostic Testing MRCPsych 2009 • INTENTION • Screening – The systematic application of a test or inquiry, to identify individuals at sufficient risk of a specific disorder to warrant further actions among those who have not sought medical help for that disorder • Case-Finding – The selected application of a test or inquiry, to identify individuals with a suspected disorder and exclude those without a disorder, usually in those who have sought medical help for that disorder • APPLICATION • Targeted (High Risk) – The highly selected application of a test or inquiry, to identify individuals at high risk of a specific disorder by virtue of known risk factors • Routine Screening – The systematic application of a test or inquiry, to individuals without a known disorder (or who have not sought medical help for that disorder) Adapted from Department of Health. Annual report of the national screening committee. London: DoH, 1997.
- 8. Defining Diagnostic Testing MRCPsych 2009 • COMPARATOR • Accuracy – The degree of approximation (veracity) to a robust comparator • Validity – The degree of approximation (veracity) to a criterion reference • Precision – The degree of predictability (low SD) in the measure
- 9. Aims of Detection MRCPsych 2009 • Screening: – Short; Easy; some false +ve (low SpS PPV), few false –ve (High Sens, NPV) • Diagnosis (case-finding) – Accurate, Few false +ve or –ve • Rating – Simple, patient rated, correl. With QoL and other outcomes
- 10. UK National Screening Committee Guidelines MRCPsych 2009 • The condition should: • The screening program should: • • Be an important health issue • • Show evidence that benefits of screening • • Have a well-understood history, with a outweighing risks detectable risk factor or disease marker • • Be acceptable to public and professionals • • Have cost-effective primary preventions • • Be cost effective (and have ongoing implemented. evaluation) • • Have quality-assurance strategies in place. • The screening tool should: • Adapted from: UK National Screening • • Be a valid tool with known cut-off Committee Criteria for appraising the • • Be acceptable to the public viability, effectiveness and appropriateness of a screening programme • • Have agreed diagnostic procedures. • http://www.nsc.nhs.uk/pdfs/criteria.pdf • The treatment should: • • Be effective, with evidence of benefits of early intervention • • Have adequate resources • • Have appropriate policies as to who should be treated.
- 11. Development of Diagnostic Tests MRCPsych 2009 Stage Type Purpose Description Pre-clinical Development Development of the proposed tool or Here the aim is to develop a screening method that is likely to help in the detection of the test underlying disorder, either in a specific setting or in all setting. Issues of acceptability of the tool to both patients and staff must be considered in order for implementation to be successful. Phase Diagnostic validity Early diagnostic validity testing in a The aim is to evaluate the early design of the screening method against a known (ideally I_screen selected sample and refinement of tool accurate) standard known as the criterion reference. In early testing the tool may be refined, selecting most useful aspects and deleting redundant aspects in order to make the tool as efficient (brief) as possible whilst retaining its value. Phase Diagnostic validity Diagnostic validity in a representative The aim is to assess the refined tool against a criterion (gold standard) in a real world II_screen sample sample where the comparator subjects may comprise several competing condition which may otherwise cause difficulty regarding differential diagnosis. Phase Implementation Screening RCT; clinicians using vs not This is an important step in which the tool is evaluated clinically in one group with access III_screen using a screening tool to the new method compared to a second group (ideally selected in a randomized fashion) who make assessments without the tool. Phase Implementation Screening implementation studies using In this last step the screening tool /method is introduced clinically but monitored to discover IV_screen real-world outcomes the effect on important patient outcomes such as new identifications, new cases treated and new cases entering remission.
- 12. Theory of Diagnostic Tests MRCPsych 2009 Cut-off value Non-Depressed Depressed # of Individuals True -ve True +ve False -ve False +ve Test Result
- 13. Low Prevalence (Se Sp = same) MRCPsych 2009 Cut-off value Non-Depressed Mj Depression # of Individuals False –ve False +ve SMALL LARGE Test Result
- 14. High Prevalence (Se Sp = same) MRCPsych 2009 Cut-off value Non-Depressed Mj+Mn Depression # of Individuals False –ve False +ve LARGE SMALL Test Result
- 15. 2. Concepts of Diagnostic Tests: Trait / Syndrome / Disease
- 16. Can This Help establish a syndrome?
- 17. Example: A Clear Disease [#1] Point of Partial Rarity Number of Individuals No Disorder True ‐ve True ‐ve True +ve True +ve Disorder False +ve False +ve False ‐ve False ‐ve Test Result
- 18. Example: A Probable Syndrome [#2] Number of Individuals No Disorder True ‐ve True ‐ve True +ve True +ve Disorder False +ve False +ve False ‐ve False ‐ve MMSE Cognitive Score
- 19. Example: A Normally Distributed Trait [#3] Number of Individuals No Disorder True ‐ve True ‐ve True +ve True +ve Disorder False +ve False +ve False ‐ve False ‐ve MMSE Cognitive Score
- 20. MRCPsych 2009 Example: Dementia Disease? Syndrome? Trait?
- 21. Hubbert et al (2005) BMC Geriatrics MRCPsych 2009 MMSE scores for dementia (n=72) and non-dementia (n=2735) Huppert et al BMC Geriatrc 2005
- 22. MRCPsych 2009 Example: Depression Disease Syndrome Trait
- 23. Mitchell, Coyne et al (2008) 110 MRCPsych 2009 100 Scores on the CES-D during Pregnancy, 3 and 12 months Post-partum in 947 Women 90 80 70 60 Early Pregnancy 50 3months Post-Partum 12months Post-Partum 40 30 20 10 0 Healthy Depressive Symptoms Mild Depression Moderate to Severe Depression
- 24. PHQ9 Linear distribution 35 MRCPsych 2009 30 PHQ9 (Major Depression) 25 PHQ9 (Minor Depression) PHQ9 (Non-Depressed) 20 15 10 5 0 ve n en n ro e e o ve n en n ur en en ne x t n gh ee Tw re Te ve n ee Si ee Ze Fo el Fi ev Ni te te O fte Th Ei nt Se Tw irt xt ur gh El Fi ve Th Si Fo Ei Se Baker-Glen, Mitchell et al (2008)
- 25. 0 500 1000 1500 2000 2500 3000 Ze ro O ne MRCPsych 2009 Tw o Th re e Fo ur Fi ve Si x Se ve n ei gh t N in e Te n El ev en Tw el ve Th irt ee n Fo ur te en Fi fte en Si Thompson et al (2001) n=18,414 xt ee Se n ve nt ee n Ei gh te en
- 26. 3. Statistics of Diagnostic Tests: 2x2s
- 27. Reference Standard Reference Standard Accuracy 2x2 Table Test Disorder Present No Disorder A/A + B MRCPsych 2009 +ve A B PPV Depression Depression Test -ve C D D/C + D NPV PRESENT ABSENT Total A/ A + C D/ B + D Sn Sp Test +ve True +ve False +ve PPV Test -ve False -Ve True -Ve NPV Sensitivity Specificity Prevalence
- 28. Accuracy 2x2 Table MRCPsych 2009 Depression Depression PRESENT ABSENT Test +ve TP FP PPV Test -ve FN TN NPV Sensitivity Specificity Prevalence
- 29. Basic Measures of Accuracy MRCPsych 2009 • Sensitivity (Se) a/(a + c) TP / (TP + FN) • A measure of accuracy defined the proportion of patients with disease in whom the test result is positive: a/(a + c) • Specificity (Sp) d/(b + d) TN / (TN + FP) • A measure of accuracy defined as the proportion of patients without disease in whom the test result is negative • Positive Predictive Value a/(a+b) TP / (TP + FP) • A measure of rule-in accuracy defined as the proportion of true positives in those that screen positive screening result, as follows • Negative Predictive Value c/(c+d) TN / (TN + FN) • A measure of rule-out accuracy defined as the proportion of true negatives in those that screen negative screening result, as follows
- 30. Accuracy in words MRCPsych 2009 • Sensitivity – The chance of testing positive among those with the condition – The chance of rejecting the null hypothesis among those that do not satisfy the null hypothesis • Specificity – The chance of testing negative among those without the condition – The chance of accepting the null hypothesis among those that satisfy the null hypothesis • Positive Predictive Value – The chance of having the condition among those that test positive – The chance of not satisfying the null hypothesis among those that reject the null hypothesis • Negative Predictive Value – The chance of not having the condition among those that test negative – The chance of satisfying the null hypothesis among those that accept the null hypothesis • Type I Error or α (alpha) or p-Value or false positive rate – The chance of testing positive among those without the condition – The chance of rejecting the null hypothesis among those that satisfy the null hypothesis • Type II Error or β (beta) or false negative rate – The chance of testing negative among those with the condition – The chance of accepting the null hypothesis among those that do not satisfy the null hypothesis • False Discovery Rate or q-Value – The chance of not having the condition among those that test positive – The chance of satisfying the null hypothesis among those that reject the null hypothesis • False Omission Rate – The chance of having the condition among those that test negative – The chance of not satisfying the null hypothesis among those that accept the null hypothesis
- 31. Rule-in Accuracy MRCPsych 2009 Depression Depression PRESENT ABSENT Test +ve True +ve False +ve PPV (type I error) (discrimination) Test -ve False –Ve True -Ve NPV (type II error) Sensitivity Specificity Prevalence (occurrence)
- 32. Rule-Out Accuracy MRCPsych 2009 Depression Depression PRESENT ABSENT Test +ve True +ve False +ve PPV Test -ve False –Ve True -Ve NPV (type II error) (discrimination) Sensitivity Specificity Prevalence (occurrence)
- 33. Likelihood Ratios MRCPsych 2009 • Likelihood Ratio for Positive Tests • The chance of testing positive among those with the condition; divided by the chance of testing positive among those without the condition • Sensitivity / (1 - Specificity) • [ TP / (TP + FN) ] / [ FP / (FP + TN) ] • = PPV/Prevalence • Likelihood Ratio for Negative Tests • The chance of testing negative among those with the condition; divided by the chance of testing negative among those without the condition • Specificity (1 – Sensitivity) • [ FN / (FN + TP) ] / [ TN / (TN + FP) ] • = NPV/Prevalence
- 34. Summary Measures MRCPsych 2009 • Youden's J – Sensitivity + Specificity – 1 • Predictive Summary Index – PPV + NPV – 1 • Overall accuracy (fraction correct) – TP+TN / TP+FP+TN+FN
- 35. Reciprocal Measures MRCPsych 2009 • Number Needed to Diagnose (NND) – 1 / (Youden's J) • Number Needed to Predict (NNP) – 1 / (PSI) • Number Needed to Screen (NNS) – 1/(FC-FiC)
- 36. Receiver Operating Characteristic Murphy JM, Berwick DM, Weinstein MC, Borus JF, Budman SH, Klerman GL 1987 : Performance of screening and diagnostic tests: Application of Receiver Operating Characteristic ROC analysis. Arch Gen Psychiatry 44:550-555
- 37. Accuracy 2x2 Table MRCPsych 2009 Depression Depression PRESENT ABSENT Test +ve True +ve False +ve PPV Test -ve False -Ve True -Ve NPV Sensitivity Specificity Prevalence
- 38. Test vs Major Depression MRCPsych 2009 Depression Depression PRESENT ABSENT Test +ve 500 1500 2000 PPV 33% Test -ve 500 4500 5000 NPV 90% 1000 6000 7000 Sensitivity Specificity Prevalence 14% 50% 75%
- 39. Test vs Major + Min Depression MRCPsych 2009 Depression Depression PRESENT ABSENT Test +ve 500 1500 2000 PPV 33% Test -ve 500 500 1000 NPV 50% 1000 2000 3000 Sensitivity Specificity Prevalence 33% 50% 33%
- 40. 4. Clinical Value of Diagnostic Tests
- 41. Added Value MRCPsych 2009 • Definition 1: – The additional ability of a test to rule-in or rule-out compared with the baseline rate – PPV minus Prevalence – NPV minus prevalence • Definition 2: – The additional of a test to rule-in or rule-out compared with the unassisted rate – PPV test minus PPV no test (assuming equal prevalence) – LR+ test minus LR+ no test – AUC test minus AUC no test
- 42. 0.00 0.10 0.20 0.30 0.40 0.50 0.60 0.70 0.80 0.90 1.00 Loss of energy Diminished drive Sleep disturbance MRCPsych 2009 Concentration/indecision Depressed mood Anxiety Diminished concentration Insomnia Diminished interest/pleasure Psychic anxiety Helplessness Worthlessness Hopelessness Somatic anxiety Thoughts of death Anger Excessive guilt Psychomotor change Indecisiveness Decreased appetite Psychomotor agitation Psychomotor retardation Decreased weight Lack of reactive mood Increased appetite All Case Proportion Hypersomnia Depressed Proportion Non-Depressed Proportion Increased weight Mitchell, Zimmerman et al MIDAS Database. Psychol Med 2007 Submitted
- 43. -0.10 0.00 0.10 0.20 0.30 0.40 0.50 A nge r A nxie ty Decr ea s e d app eti te MRCPsych 2009 Decr ea s e d we ig ht Depr es sed m oo d Dimin is hed c onc entr at ion Dimin is hed dr ive Dimin is hed int er est /p leasu re Exc e ss ive guilt Help le ss nes s Hope le s snes s Hy pe rsom n ia Inc re ased appe t ite Inc re ased w eig ht Indec isiv e ne ss Ins om nia L ac k of re act iv e mo od L os s o f en erg y Ps ych i c an x iety Ps ych omot or a g i tatio n Ps ych omot or c h ang e Ps ych o mot o r ret a rdatio n Sl eep dis tu rban ce Soma ti c a n x iety Rule-In Added Value (PPV-Prev) Thou g Rule-Out Added Value (NPV-Prev) hts o f dea th Wor t hle s sne ss
- 44. Accuracy of Tests: Visual MRCPsych 2009 Very unlikely unlikely likely Very likely Overall 10% - (22) -50% = 54% CIDI (computer) Any Depression PHQ-2 3% - (16) - 32% = 29% Henckel et al (2004) Eur Arch Psychiatry Clin Neurosci CIDI (computer) Any Depression WHO5 (1+3) 3% - (16) - 32% = 29% Henckel et al (2004) Eur Arch Psychiatry Clin Neurosci CIDI (computer) Mj Depression 1 Question 3% - (37) - 63% = 60% Arroll B et al (2003) BMJ CIDI (computer) Mj Depression 2 Questions 25% 75% 0% 32% - (37) - 96% = 64% 100%
- 45. 1.00 Post-test Probability MRCPsych 2009 0.90 0.80 0.70 0.60 0.50 0.40 Clinician Positive (Fallowfield et al, 2001) 0.30 Clinician Negative (Fallowfield et al, 2001) Baseline Probability 0.20 HADS-D Positive (Mata-analysis) HADS-D Negative (Meta-analysis) 0.10 Pre-test Probability 0.00 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
- 46. 1.00 Post-test Probability Depression Present (Routine) 0.90 Depression Absent (Routine) MRCPsych 2009 Depression Scales +ve (Median) 0.80 Depression Scales -ve (Median) Prior Probability 0.70 0.60 0.50 PPV=0.41 0.40 0.30 0.20 0.10 NPV=0. 97 Pre-test Probability 0.00 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 Prevalence of 0.15
- 47. 5. Worked Examples of diagnostic tests
- 48. PostStroke Mj Depression vs NonMj MRCPsych 2009 • Clinicians diagnosis using DSMIV vs SCAN/PSE • Using the SCAN: • 50 people with major depression • 150 healthy people • 50 with minor depression
- 49. Clinicians using DSMIV MRCPsych 2009 • Clinicians diagnosed 52 cases with Mj depression • The specificity of DSMIV was 95% • Q. What was the sensitivity? • Q. What was the prevalence? • Q. What was the PPV? • Q. What was the % correctly identified per every 100 screened?
- 50. Test vs Major Depression MRCPsych 2009 Depression Depression On SCAN ABSENT Test +ve ?? 52 (Clinician) PPV ??% Test -ve ?? NPV ??% 50 200 Sensitivity Specificity Prevalence ??% 50% 95%
- 51. Symptoms Post- Post- Sensitivi No Post- Non Specifici PPV NPV Positive Negat Identificati NNS NND NNP Stroke Stroke ty Stroke Depresse ty Utility ive on Index MRCPsychDepressio 2009 Depressio Depressio d Stroke Index Utility n by n with n by Patient Index reference symptom reference without standard standard symptom Persistent 50 45 0.90 200 184 0.92 0.74 0.97 0.66 0.90 83.20 1.20 1.22 1.41 low mood Loss of 50 48 0.96 200 156 0.78 0.52 0.99 0.50 0.77 63.20 1.58 1.35 1.96 interest Loss of drive 50 40 0.80 200 120 0.60 0.33 0.92 0.27 0.55 28 3.57 2.50 3.90 Low energy 50 49 0.98 200 20 0.10 0.21 0.95 0.21 0.10 -44.80 -2.23 12.50 6.01 Insomnia 50 35 0.70 200 136 0.68 0.35 0.90 0.25 0.61 36.80 2.72 2.63 3.93 Poor 50 25 0.50 200 178 0.89 0.53 0.88 0.27 0.78 62.40 1.60 2.56 2.45 appetite Suicidal 50 2 0.04 200 196 0.98 0.33 0.80 0.01 0.79 58.40 1.71 50 7.32 thoughts Poor 50 28 0.56 200 114 0.57 0.25 0.84 0.14 0.48 13.60 7.35 7.69 11.93 concentratio n Poor 50 10 0.20 200 164 0.82 0.22 0.80 0.04 0.66 39.20 2.55 50 46.92 orientation Anger 50 17 0.34 200 172 0.86 0.38 0.84 0.13 0.72 51.20 1.95 5 4.61 DSMIV 50 42 0.84 200 190 0.95 0.81 0.96 0.68 0.91 85.60 1.17 1.27 1.30 algorithm
- 52. 6. Advanced Techniques sROC Real World Numbers NND; NNS Bivariate meta-analysis Economics
- 53. MRCPsych 2009 PPV DT Distress = 55%; PPV Other Methods 65%
- 54. 1.00 ROC Plot 0.90 MRCPsych 2009 Low Mood Sensitivity 0.80 DSMIV 0.70 Low mood & loss interest 0.60 0.50 0.40 0.30 0.20 0.10 0.00 0.00 0.10 0.20 0.30 0.40 0.50 0.60 0.70 0.80 0.90 1.00 1 - Specifity
- 55. MRCPsych 2009
- 56. Bivariate Diagnostic meta-analysis MRCPsych 2009
- 57. Measure Basic Formula Strength Weakness Reciprocal Absolute Reciprocal Absolute Benefit Benefit Formula Youden Index sensitivity + specificity – 1 Relatively independent of Requires application of Number Needed to NND = 1/Youden prevalence criterion (gold) standard) Diagnose Not clinically interpretable Does not assess ratio of false positives to negatives Predictive PPV + NPV – 1 Measures gain Dependent of prevalence Number Needed to NNP = 1/PSI Summary Index Clinically applicable Places equal weight on Predict rule-in and rule-out accuracy Overall Accuracy TP+TN / TP+FP+TN+FN Measures real number of Requires application of Number needed to Screen NNS= 1/Idemtification (Fraction Correct) correct identifications vs criterion (gold) standard) Index misidentifications Can be easily converted into a percentage
- 58. Further Reading MRCPsych 2009 • David A Grimes, Kenneth F Schulz Uses and abuses of screening tests Lancet 2002; 359: 881–84 • Jonathan J Deeks, Douglas G Altman Diagnostic tests 4: likelihood ratios BMJ VOLUME 329 17 JULY 2004 • Patrick M Bossuyt, Les Irwig, Jonathan Craig and Paul Glasziou Comparative accuracy: assessing new tests against existing diagnostic pathways. BMJ • 2006;332;1089-1092 • Reitsma JB et al Bivariate analysis of sensitivity and specificity produces informative summary measures in diagnostic reviews. Journal of Clinical Epidemiology 58 (2005) 982–990

