Your SlideShare is downloading. ×
Upcoming SlideShare
Loading in...5

Thanks for flagging this SlideShare!

Oops! An error has occurred.


Introducing the official SlideShare app

Stunning, full-screen experience for iPhone and Android

Text the download link to your phone

Standard text messaging rates apply



Published on

Published in: Health & Medicine

  • Be the first to comment

  • Be the first to like this

No Downloads
Total Views
On Slideshare
From Embeds
Number of Embeds
Embeds 0
No embeds

Report content
Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

No notes for slide


  • 1. International Journal for Quality in Health Care 1999; Volume 11, Number 1: pp. 21–28Development and application of a genericmethodology to assess the quality ofclinical guidelinesFRANCOISE A. CLUZEAU1, PETER LITTLEJOHNS1, JEREMY M. GRIMSHAW2, GENE FEDER3 AND ¸SARAH E. MORAN11 Health Care Evaluation Unit, St George’s Hospital Medical School, London, 2Health Services Research Unit, University of Aberdeenand 3Department of General Practice and Primary Care, St Bartholomew’s and the Royal London School of Medicine and Dentistry,London, UKAbstractBackground. Despite clinical guidelines penetrating every aspect of clinical practice and health policy, doubts persist overtheir ability to improve patient care. We have designed and tested a generic critical appraisal instrument, that assesses whetherdevelopers have minimized the biases inherent in creating guidelines, and addressed the requirements for effectiveimplementation.Design. Thirty-seven items describing suggested predictors of guideline quality were grouped into three dimensions coveringthe rigour of development, clarity of presentation (including the context and content) and implementation issues. The easeof use, reliability and validity of the instrument was tested on a national sample of guidelines for the management of asthma,breast cancer, depression and coronary heart disease, with 120 appraisers. A numerical score was derived to allow comparisonof guidelines within and between diseases.Results. The instrument has acceptable reliability (Cronbach’s coefficient, 0.68–0.84; intra-class correlation coefficient,0.82–0.90). The results provided some evidence of validity (Pearson’s correlation coefficient between appraisers’ dimensionscores and their global assessment was 0.49 for dimension one, 0.63 for dimension two and 0.40 for dimension three). Theinstrument could differentiate between national and local guidelines and was easy to apply. There was variation in theperformance of guidelines with most not achieving a majority of criteria in each dimension.Conclusions. Use of this instrument should encourage developers to create guidelines that reflect relevant research evidencemore accurately. Potential users or groups adapting guidelines for local use could apply the instrument to help decide whichone to follow. The National Health Service Executive is using the instrument to assist in deciding which guidelines torecommend to the UK National Health Service. This methodology forms the basis of a common approach to assessingguideline quality in Europe.Keywords: appraisal, clinical guidelines, instrument, quality, reliability, validityClinical guidelines are now ubiquitous in every aspect of An increasing concern is the number of disease-specificclinical practice and health policy. They are expected to fulfil guidelines that offer inconsistent advice [6,7]. Many reasonsa myriad of roles from increasing the uptake of research have been put forward to explain this variability rangingfindings [1] to facilitating the rationing of health care [2]. from lack (or differing interpretation) of underlying researchWhilst there is evidence that guidelines can improve clinical findings, different values given to anticipated outcome (forpractice, their successful introduction is dependent on many example clinical versus economic), dubious achievement offactors, including the clinical context, methods of de- consensus and possible bias introduced through conflicts ofvelopment, dissemination and implementation [3]. Suc- interest. Faced with this diversity, potential users will wantcessfully addressing all of these issues in routine practice can to make an informed choice. However the information onprove difficult [4], but is necessary if guidelines are to improve which to base this judgement is often lacking [8,9]. Ideally,the quality of health care [5]. data from a formal evaluation of the ability of the guidelinesAddress correspondence to Francoise Cluzeau, Health Care Evaluation Unit, St George’s Hospital Medical School, Cranmer ¸Terrace, London, SW17 0RE, UK. Tel: +44 181 725 2771. Fax: + 44 181 725 3584. E-mail:© 1999 International Society for Quality in Health Care and Oxford University Press 21
  • 2. F. A. Cluzeau et bring about the anticipated health outcomes when adhered contains 20 items and assesses responsibility and endorsementto (defined as validity [10]) would be available; in reality there for the guidelines, the composition of the development group,is a virtual absence of this type of outcome data for most identification and interpretation of evidence, the link betweenguidelines. Moreover when results from carefully controlled evidence and main recommendation, peer review and up-randomized trials of guidelines implementation strategies are dating. The second dimension, context and content, containsavailable they may not necessarily be generalizable to a routine 12 items addressing the attributes of guideline reliability,clinical setting [11]. In the absence of appropriate outcome applicability, flexibility and clarity. It assesses the aims of theindicators on which to judge effectiveness, most assessments guidelines, the target population, circumstances for applyingof clinical quality substitute process and structural criteria the recommendations, presentation and format of the guide-[12]. Indeed this is often the most practical way to assess lines and estimated benefits–harms and costs. The thirdquality of care on a routine basis [13]. Using this approach dimension, application, contains five items addressing theto the assessment of guidelines requires the determination of implementation, dissemination and monitoring strategies. Allwhether guideline developers have been rigorous in min- three dimensions assess the adequacy of documentation.imizing the potential biases in creating the guideline [14], in Each item inquires whether information is present andessence, critically appraising guidelines. then requires a judgement about the quality of the information. There is increasing published work on how to critically The specific questions demand ‘yes’, ‘no’, ‘not sure’ answers.appraise primary research and reviews [15–17]. This work An option for ‘not applicable’ answers is available for somehas been stimulated by the Cochrane Collaboration [18]. items. To ensure that the questions were interpreted con-However, the application of this approach to guidelines is in sistently and to minimize the need for judgement a userits infancy. In 1992 the Institute of Medicine (IOM) started manual was designed; this contained a detailed explanationthe process by developing a provisional, if unwieldy, appraisal of the meaning of each question [23], and suggested cir-instrument based on ‘desirable attributes’ of good guidelines cumstances where a ‘yes’ answer may be appropriate. In the[19]. Subsequently, shorter checklists were produced in study a global assessment of the guidelines was asked for, asCanada [20] and Australia [21] but their usefulness has never a measure of overall quality: ‘strongly recommended’ (for usebeen formally assessed. in practice without modifications); ‘recommended’ (for use In June 1993, The UK National Health Service Man- in practice on condition of some alterations or with provisos);agement Executive organized a workshop to explore the or ‘not recommended’ (not suitable for use in practice).issues around assessing the quality of guidelines. A researchprogramme was initiated to produce a generic instrument to Selection of guidelines for appraisalappraise guidelines. The instrument should be capable ofbeing applied by anyone (general or specialist clinicians, Sixty guidelines were selected from a national survey ofhealth care managers, and researchers) interested in assessing UK guidelines between January 1991 and January 1996 onguidelines and should allow comparison between guidelines. coronary heart disease, asthma, breast cancer and depressionThis paper describes the creation of the instrument, an (15 guidelines per disease group) [24]. The size of theassessment of its validity and reliability, and a description of sample was based on Nunnally’s recommendation that atthe quantity and quality of UK guidelines for the management least 300 observations are needed for inter-rater test ofof coronary artery disease, depression, breast cancer and reliability [25]. We hypothesized that national guidelinesasthma. would be more systematically developed than local ones. All 12 guidelines produced by nationally recognized or- ganizations or commissioned by the NHS Executive wereMethods selected. Forty-eight local guidelines were drawn through a random sample. Guideline authors were asked to provideAppraisal instrument copies of their guidelines and information on how their guidelines had been developed.The purpose of the appraisal instrument is to assess the extentto which clinical guidelines are ‘systematically developed’ Appraisers[22], and take into account known determinants of effectivestrategies for dissemination and implementation. Initially the Each guideline was assessed independently by six appraisersreliability and face validity of the IOM instrument was tested (120 in total). Each assessed three guidelines. Each block ofon five UK guidelines with seven appraisers in a pilot study three guidelines (20 blocks altogether) was assessed by the[7]. Based on these results potential questions for a simplified same six appraisers (Figure 1). These included a nationalappraisal tool were circulated to individuals interested in expert in the disease area, a general practitioner, a publicguideline development for comments. The revised list con- health physician, a hospital consultant physician, a nursetained 37 items (see Appendix). These address different specializing in the disease area, and a researcher on guidelineaspects and are categorized into three conceptual dimensions methodology. They were recruited through UK cardiac units,which could be mapped to the IOM attributes. The first asthma centres, the Royal College of General Practitioners,dimension, rigour of development, reflects the attributes respondents to the survey, the Royal College of Nursing andnecessary to enhance guideline validity and reproducibility. It research institutions and were randomly allocated guidelines.22
  • 3. Guidelines appraisal methodology Guidelines calculating Pearson’s correlation coefficients between ap- praisers’ dimension scores and their global assessment of a Appraisers 1 2 3 1 guideline. We predicted that dimension scores for national 2 3 guidelines would be higher than those for local guidelines. 4 5 In an attempt to investigate validity further, analysis of 6 4 5 6 7 variance (ANOVA) was used to test this hypothesis. ANOVA 8 9 was also used to examine the effect of year of publication, 10 11 disease area and level of background information on guideline 12 7 8 9 13 dimension scores. Year of publication was classified into three 14 15 categories: pre 1994, 1994–1996 and unknown. These were 16 17 chosen because a number of influential papers and re- 18 10 11 12 19 commendations had been published about the development 20 21 of guidelines in 1993 [29,30]. A zero skewness log trans- 22 23 13 14 15 formation was used in the ANOVA for dimensions one and 24 25 three because the scores were not normally distributed. 26 27 Mann–Whitney tests were used on individual appraisers’ 28 29 scores to examine differences between professional groups. 30 Appraisers who omitted at least one question in a di- mension were excluded from calculations of the ICCs andFigure 1 Design for the assessment of coronary heart disease Pearson’s correlation coefficients for that dimension.guidelines (design repeated for other three disease areas:asthma, breast cancer and depression). ResultsAnalysis Background information was received for 53 guidelines. FiveIn order to allow comparison of guideline performance, guidelines (three national and two local) had a backgrounddimension scores for each guideline were calculated. A ‘yes’ document with details of their development process. Com-response was given a value of 1 and other responses ( ‘no’, pleted structured questionnaires were available for 46 guide-‘not sure’ and ‘not applicable’) a value of zero. Individual lines and two authors provided information in a letter. Noappraisers’ dimension scores were calculated by summing additional information was available for seven local guidelines.their scores for each item within a dimension. A guideline Thirty-eight guidelines had been published between 1994 anddimension score was obtained by calculating the mean of the 1996, 14 between 1992 and 1993 and eight documents wereappraisers’ scores. This was then expressed as a percentage undated. One appraiser had been closely associated with theof the maximum possible score for that dimension in order development of one of the guidelines and therefore assessedto compare scores across the three dimensions. only two guidelines. Figure 2 shows the distribution of guideline scores forItem dimension each dimension. Over two-thirds of guidelines scored lessWe calculated Pearson’s correlation coefficients between each than 50 on dimension one, which means that less than 50%item and dimension scores, omitting the index item, to check of criteria for rigorous development were met. The medianthat each item was in the appropriate dimension [26]. for dimension one was 30.4 with a wide range of 0.8–85. The median score was higher for dimension two (47.9).Reliability Performance was poorest on dimension three (median 24.2).Reliability of the instrument was assessed in two ways: The distribution for this dimension was very skewed withfirst, internal consistency was measured by calculating the scores ranging from 0 to 95.correlation between all items within a dimension to test towhat extent they measured the same underlying concept, Item dimensionusing Cronbach’s coefficient [27]. Second, inter-rater agree- Items were in the appropriate dimension as all but twoment was measured by calculating the intra-class correlation correlated more highly with their dimension scores than withcoefficient (ICC) for the dimension scores according to the the other two dimensions’ scores (table of results availablecriteria of Shrout and Fleiss [28]. Calculations were based on from the authors).the assumption that each guideline was assessed by a differentset of appraisers. ReliabilityValidity All three dimensions had good internal consistency (Cron-In the absence of a gold standard or a validated measure of bach’s , 0.68–0.84) and excellent inter-rater agreement (ICCs,guideline quality, evidence of criterion validity was sought by 0.82–0.90) and narrow confidence intervals [31] (Table 1). 23
  • 4. F. A. Cluzeau et al. Validity The Pearson’s correlation coefficients between appraisers’ dimension scores and their global assessment were 0.49 (n= 311) for dimension one, 0.63 (n=319) for dimension two and 0.40 (n=315) for dimension three. All coefficients were highly significant (P<0.0001), providing evidence of criterion validity. However it should be noted that the appraisers made their global assessment after completing the instrument, so a significant correlation would be expected. Mean standardized guideline scores are presented in Table 2. National guidelines had a significantly higher score than local guidelines for the three dimensions (dimension one P<0.001, dimension two P=0.0008, dimension three P= 0.04), confirming our a priori hypothesis, and hence providing some further evidence of validity. Guidelines with a back- ground document or a form performed significantly better than others on dimension one (P<0.001), although numbers were small and confidence intervals were wide. Median scores for researchers was significantly lower than that for the nurses in all dimensions. The median scores for consultant physicians, general practitioners and public health physicians were significantly higher than those of the re- searchers for dimension two (table of results available from the authors). Discussion This study has shown that it is feasible to develop an instrument that can be used for appraising the methodological quality of clinical guidelines. The instrument has good re- liability and there is suggestion of validity. Assessing the quality of any health care intervention is complex because of the multidimensional nature of the concept [32], and guide- lines are no exception [33]. Although the linear separation of the creation process into development, dissemination, and implementation provides a useful framework [3], interactions between these stages and differing perceptions by the variousFigure 2 Frequency distribution of guidelines’ standardized participants (developer, user, patient and payer) of what arescores by dimension. satisfactory outcomes creates a more complicated picture. It is possible to use randomized controlled trials to assess whether guidelines can change practice in the required dir- ection [34,35], but observational studies can also provide useful information on the performance of a guideline in practice [36]. However, even this level of evaluative data is rarely available for newly developed guidelines. Furthermore,Table 1 Cronbach’s correlation coefficient and intraclass researchers require reassurance that they have a ‘good enough’correlation coefficient guideline before embarking on a long and expensive evalu- ation. The development of a useable, valid and reliable genericDimension Cronbach’s ICC (95% CI∗)............................................................................................................ instrument to assess the rigour with which guidelines are1. Rigour of 0.84 0.90 (0.85–0.93) created provides an essential first step in the evaluative development process. Aside from the use of guidelines in research, potential2. Context and content 0.78 0.82 (0.74–0.89) users of the guidelines (either for direct patient care or3. Application 0.68 0.84 (0.77–0.90) commissioning of health care) and groups adapting guidelines for local use, need to make systematic and reliable judgements∗ Confidence intervals (CI) were calculated using Wald’s Method of their quality. This instrument provides a basis for these[28]. judgements.24
  • 5. Guidelines appraisal methodologyTable 2 Mean standardized guidelines scores and their confidence intervals (CI) for each dimension according to type ofguideline, level of information, disease area and year of production Dimension 1 Dimension 2 Dimension 3 .................................................... .................................................... ...................................................... Mean 95% CI Mean 95% CI Mean 95% CI.............................................................................................................................................................................................................................All guidelines (n=60) 34.0 29.6–38.3 46.2 41.8–50.7 29.0 22.9–35.1Type of guideline Local (n=48)1 29.2 25.3–33.1 42.3 37.5–47.0 26.5 19.9–33.2 National (n=12) 52.93 42.7–63.1 62.03 55.0–68.9 38.82 23.0–54.7Level of information Nothing (n= 7)1 12.4 3.0–21.9 40.4 19.9–60.9 14.3 −4.6–33.1 Letter (n=2) 28.8 2.3–55.2 58.3 40.7–76.0 20.6 −88.5–129.6 Form (n=46) 35.83 31.3–40.3 45.9 41.0–50.8 31.1 24.1–38.1 Document (n=5) 49.13 27.7–70.5 52.5 29.5–75.5 34.0 −2.0–70.0Disease area Cancer (n=15)1 33.6 25.3–42.0 48.1 40.0–56.2 29.6 16.5–42.7 Depression (n=15) 33.1 26.8–39.3 46.2 35.2–57.2 29.5 20.4–38.6 Asthma (n=15) 34.1 21.4–46.8 46.0 35.6–56.3 27.3 13.7–40.9 Coronary heart 35.1 25.3–44.9 44.5 35.2–53.8 29.6 12.8–46.4 disease (n=15)Year of publication Unknown (n=8)1 23.0 8.6–37.4 33.9 22.9–44.9 12.8 1.9–23.7 1994–1996 (n=38) 36.5 30.9–42.2 47.5 42.1–53.0 34.32 26.0–42.5 Pre-1994 (n=14) 33.3 25.4–41.1 49.6 38.8–60.5 24.0 12.9–35.11 Reference category.2 0.001< P<0.05.3 PΖ0.001. To our knowledge this is the first time a study of its kind Executive in the UK to help to decide which guidelines arehas been undertaken and there are a number of method- to be recommended to the NHS (a dichotomous outcomeological issued that need to be addressed. First, the de- for each guideline) [39]. It has formed the basis of a BIOMED-velopment of appraisal criteria is conceptually similar to the 2 research project involving 10 European countries anddevelopment of instruments or checklists for assessing the Canada to identify the reasons for differences in guidelinequality of randomized controlled trials [15,37]. This means recommendations across countries [40]; this will provide athat the instrument will require regular testing and revision. further test of validity by assessing the usefulness of theFurther work on the instrument will need to be undertaken scoring system in explaining inconsistencies of guidelinesto examine issues of validity, item refinement and weighting recommendations.of items. Second, an apparently rigorous development process In practice, a key indicator of its value will be whethercan still hide aberrant clinical recommendations. Therefore it is perceived to be useful in helping developers to addressdetailed analysis and comparison of the clinical content of the issues necessary to produce good quality guidelinesguidelines is also necessary to ensure that the re- and potential users (individuals and organizations) to decidecommendations are clinically sound before they can be ad- on which guidelines to use. The progression from aopted. For example, we will need to test the instrument qualitative to quantitative assessment of guidelines shouldagainst guidelines that have been shown to have predictive facilitate this process. Rather than splitting guidelines intovalidity, such as the Ottawa ankle rules [38]. Third, it remains good or bad, the instrument provides a numerical descriptionan assumption that the structural and process factors that of a guideline in three key dimensions. This allows potentialare assessed by the instrument are true determinants of valid users to relate a guideline to the whole population ofand effective guidelines. It would be reassuring if the validity guidelines and then to decide on the basis of theirof this approach itself (as opposed to the validity of the requirements. For example, a hospital wishing to introduceinstrument) could be subjected to a formal evaluation. How- a guideline for the management of asthma may want toever this can only happen if the instrument is widely used look at guideline X because of its rigour of developmentand the performance of the guidelines as described by the but also to review guideline Y because it scores high oninstrument is compared to an external standard. This is clarity. This approach could provide a quality dimensionbeginning to happen as the instrument has been translated to the databases of guidelines that are now emerging ininto Italian and French and is being used by the NHS Canada [41], America [42] and Germany [43]. 25
  • 6. F. A. Cluzeau et al.Acknowledgements 17. Moher D, Jadad AR, Nichol G et al. Assessing the quality of randomised controlled trials: an annotated bibliography of scales and checklists. Controlled Clin Trials 1995; 16: 62–73.The authors thank the guidelines authors for making theirdocuments available, the Royal College of General Prac- 18. Chalmers I, Haynes B. Reporting, updating, and correctingtitioners for recruiting the general practitioners, Professor systematic reviews of the effects of health care. Br Med J 1994;Martin Bland for his helpful comments on earlier drafts, and 309: 862–865.the appraisers. 19. Lohr KN, Field MJ. A Provisional Instrument for Assessing Clinical Practice Guidelines. In Institute of Medicine, Field MJ, Lohr K, eds, Guidelines: from Practice to Use. Washington DC: NationalReferences Academy Press, 1992. (Appendix B). 20. Hayward RS, Wilson MC, Tunis SR et al. More informative 1. Haines A, Jones R. Implementing findings of research. Br Med abstracts of articles describing clinical practice guidelines. Ann J 1994; 308:1488–1492. Intern Med 1993; 118: 731–737. 2. Durand-Zaleski I, Colin C, Blum-Boisgard C. An attempt to 21. Liddle J, Williamson M, Irwig L. Method for Evaluating Research save money by using mandatory practice guidelines in France. and Guideline Evidence. Sydney: NSW Health Department, 1996. Br Med J 1997; 315: 943–946. 22. Institute of Medicine. Field MJ, Lohr K. (eds) Clinical Practice 3. Grimshaw JM, Russell IT. Effect of clinical guidelines on Guidelines: Directions for a New Program: 38. Washington DC: medical practice: a systematic review of rigorous evaluations. National Academy Press, 1990. Lancet 1993; 342: 1317–1322. 23. Ware Jr JE, Snow K, Kosinski M, Gandek B. SF-36 Health 4. Paccaud F. Variation in guidelines. J Health Serv Res Policy 1997; Survey: Manual and Interpretation Guide. Boston, MA: New England 2: 53–55. Medical Center, 1993. 5. Brook RH. Implementing medical guidelines. Lancet 1995; 346: 132. 24. Cluzeau F, Littlejohns P, Grimshaw J, Feder G. National survey of UK guidelines for the management of coronary heart disease, 6. Swales JD. Guidelines on guidelines. J Hypertension 1993; 11: lung and breast cancer, asthma and depression. J Clin Effect 899–903. 1997; 2: 120–123. 7. Thomson R, McElroy H, Sudlow M. Guidelines on anticoagulant 25. Nunnally JC. Psychometric Theory. New York, NY: McGraw-Hill, treatment in atrial fibrillation in Great Britain: variation in 1981. content and implications for treatment. Br Med J 316: 509–513. 26. Streiner DL, Norman GR. Health Measurement Scales. A 8. Cluzeau F, Littlejohns P, Grimshaw J, Hopkins A. Appraising Practical Guide to their Development and Use, 2nd edn. New York, clinical guidelines and the development of criteria: a pilot study. NY: Oxford University Press, 1995. J Interprofes Care 1995; 9: 227–235. 27. Bland JM, Altman DG. Cronbach’s Alpha: Statistics Notes. Br 9. Ward JE, Grieco V. Why we need guidelines for guidelines: a Med J 1997; 314: 572. study of the quality of clinical practice guidelines in Australia. Med J Aust 1996; 165: 574–576. 28. Shrout PE, Fleiss JL. Intraclass correlations: uses in assessing rater reliability. Psycholog Bull 1979; 86: 420–428.10. Institute of Medicine. Field MJ, Lohr K. (eds) Guidelines for Clinical Practice. From Development to Use. Washington DC: National 29. Grimshaw JM and Russell IT. Achieving health gain through Academy Press, 1992. clinical guidelines. I: Developing scientifically valid guidelines. Qual Health Care 1993; 2: 243–248.11. Black N. Why we need observational studies to evaluate the effectiveness of health care. Br Med J 1996; 312: 1215–1218. 30. Woolf SH. Practice guidelines, a new reality in medicine II. Methods of developing guidelines. Arch Intern Med 1992; 152:12. Donabedian A. The Criteria and Standards of Quality. Explorations 946–952. in Quality Assessment and Monitoring. Ann Arbor, MI: Health Administration Press, 1982. 31. Burdick RK, Maqsood F, Graybill FA. Confidence intervals on the intraclass correlation in the unbalanced one-way clas-13. Ierodiakonou K, Vandenbroucke JP. Medicine as a stochastic sification. Commun Stats – Theory Methods 1986; 15: 3353–3378. art. Lancet 1993; 341: 542–543. 32. Maxwell RJ. Quality assessment in health. Br Med J 1984; 288:14. Baker R, Feder G. Clinical guidelines: where next? Int J Qual 1470–1472. Health Care 1997; 9: 399–404. 33. Margolis CZ. Methodology Matters – VII. Clinical Practice15. Downs SH, Black N. The feasibility of creating a checklist for Guidelines: Methodological Considerations. Int J Qual Health the assessment of the methodological quality both of randomised Care 1997; 9: 303–306. and non-randomised studies of health care interventions. J Epidemiol Commun Health 1998; 52: 377–384. 34. Effective Health Care: Implementing Clinical Practice Guidelines: can Guidelines be used to Improve Clinical Practice? Leeds: University of16. Moher D, Pham B, Jones A et al. Does quality of reports Leeds, 1994. of randomised trials effect estimates of intervention efficacy reported in meta-analyses? Lancet 1998; 352: 609–613. 35. Worrall G, Chaulk P, Freake D. The effects of clinical practice26
  • 7. Guidelines appraisal methodology guidelines on patients outcomes in primary care: a systematic review. Can Med Assoc J 1997; 156: 1705–1712. 9. If so, is (are) the method(s) for rating the evidence adequate?36. Steinhoff MC, Abd El Khalek MK, Khallaf N et al. Effectiveness 10. Is there a description of the methods used to formulate the of clinical guidelines for the presumptive treatment of strep- recommendations? tococcal pharyngitis in Egyptian children. Lancet 1997; 350: 11. If so, are the methods satisfactory? 918–921. 12. Is there an indication of how the views of interested parties37. Jadad AR, Moore RA, Carroll D et al. Assessing the quality of not on the panel were taken into account? reports of randomized clinical trials: is blinding necessary? Control Clin Trials 1996; 17: 1–12. 13. Is there an explicit link between the major recommendations and the level of supporting evidence?38. Stiell IG, Greenberg GH, McKnight RD et al. Decision rules for the use of radiography in acute ankle injuries. Refinement and 14. Were the guidelines independently reviewed prior to pub- prospective validation. J Am Med Assoc 1993; 269: 1127–1132. lication/release?39. CMO Update 16. 1997: Number 8. 15. If so, is explicit information given about the methods and how comments were addressed?40. Littlejohns P, Cluzeau F. Promoting the rigorous development 16. Were the guidelines piloted? of clinical guidelines in Europe through the creation of a common appraisal instrument. Scientific Basis for Health Services, 17. If so, is explicit information given about the methods used and Amsterdam, 1997 (abstract). the results adopted?41. Graham I, Beardall S, Carter A, Laupacis A. The state of 18. Is there a mention of a date for reviewing or updating the the art of practice guidelines development, dissemination, and guidelines? evaluation in Canada. Scientific Basis for Health Services, Amsterdam, 1997. 19. Is the body responsible for the reviewing and updating clearly identified?42. Stephenson J. Revitalized AHCPR pursues Research on Quality. J Am Med Assoc 1997; 278: 1557. 20. Overall, have the potential biases of guideline development been adequately dealt with?43. Lauterbach KW, Lubecki P, Oesingmann U et al. A concept for a clearing procedure for guidelines in Germany (in German). Dimension two: context and content Zeitschrift Fur Arztliche Fortbildung Und Qualitatssicherung 1997; 91: 21. Are the reasons for developing the guidelines clearly stated? 283–288. 22. Are the objectives of the guidelines clearly defined? 23. Is there a satisfactory description of the patients to which theAppendix. Appraisal instrument guidelines are meant to apply?Dimension one: rigour of development process 24. Is there a description of the circumstances (clinical or non- clinical) in which exceptions might be made in using the 1. Is the agency responsible for the development of the guidelines guidelines? clearly identified? 25. Is there an explicit statement of how the patient’s preferences 2. Was external funding or other support received for developing should be taken into account in applying the guidelines? the guidelines? 26. Do the guidelines describe the condition to be detected, treated, 3. If external funding or support was received, is there evidence or prevented in unambiguous terms? that the potential biases of the funding body(ies) were taken into account? 27. Are the different possible options for the management of the condition clearly stated in the guidelines? 4. Is there a description of the individuals (e.g. professionals, interest groups – including patients) who were involved in the 28. Are the recommendations clearly presented? guidelines development group? 29. Is there an adequate description of the health benefits that are likely to be gained from the recommended management? 5. If so, did the group contain representatives of all key disciplines? 30. Is there an adequate description of the potential harms or risks 6. Is there a description of the sources of information used to that may occur as a result of the recommended management? select the evidence on which the recommendations are based? 31. Is there an estimate of the costs or expenditures likely to incur 7. If so, are the sources of information adequate? from the recommended management? 8. Is there a description of the method(s) used to interpret and 32. Are the recommendations supported by the estimated benefits, assess the strength of evidence? harms and costs of the intervention? 27
  • 8. F. A. Cluzeau et al.Dimension three: application of guidelines 36. Does the guideline document identify clear standards or targets?33. Does the guideline document suggest possible methods for 37. Does the guideline document define measurable outcomes that dissemination and implementation? can be monitored?34. (National guidelines only) Does the guideline document identify key elements which need to be considered by local guideline groups?35. Does the guideline document specify criteria for monitoring compliance? Accepted for publication 28 September 199828