Case control study – part 1


Published on

Published in: Health & Medicine
  • Be the first to comment

Case control study – part 1

  1. 1. Case control study - Part 1 Dr. Rizwan S A, M.D., 1
  2. 2. Outline of presentation • • • • • Some history Planning and conducting a study Matching Sources of bias Applications 2
  3. 3. A scenario • Assume you are the senior health advisor to the GOI • Recently, several isolated reports of neurological illness following DPT vaccination have come up in the country • Media is adding fuel to the fire • Parents and doctors are reluctant to vaccinate and the vaccination rates are going down the drain! • What will you do? 3
  4. 4. Some history • 1788 - Early concepts found in works of Parisian physician PCA Louis • 1843 - First explicit description by William Augustus Guy (occupational exposure and pulmonary disease) • 1862 - Baker, case control comparisons of marriage and fertility in breast cancer patients • 1926 - Lane Claypon’s Breast cancer study • 1950 - Levin et al ; Wynder & Graham ; Schrek et al. and Doll & Hill; (smoking and lung cancer) 4
  5. 5. Planning and conducting • • • • • Research question Definition of case Definition of control Selecting the cases & controls Research instrument 5
  6. 6. Case Control Studies Cohort Studies Proceeds from effect to cause Proceeds from cause to effect Starts with the disease Starts with people exposed to the risk factor or suspected cause Tests whether the suspected cause occurs more frequently in those with disease than those without disease Tests whether disease occurs more frequently in those exposed than in those not exposed Usually the 1st approach to the testing of hypothesis, but also useful for exploratory studies Reserved for the testing of precisely formulated hypothesis Involves fewer study subjects Involves larger number of subjects Yields results relatively quickly Long follow-up, delayed results Suitable for study of rare diseases Inappropriate when disease or exposure under investigation is rare Generally, yields only estimate of relative risk (Odds ratio) Yields incidence rates, relative risk, attributable risk Cannot yield information about disease other than that under study Can give information about more than one disease outcome Relatively inexpensive Expensive 6
  7. 7. Research question • • • • Begin with broad and ambitious question Later, narrow and more precise Considerations of time, cost Eg. 1. Does tobacco cause cancer? 2. Does smoking tobacco cause bronchogenic CA? 3. Do persons having broncho. CA have h/o greater exposure to tobacco smoking as compared to persons w/o the disease? • Poor questions can spoil the entire study 7
  8. 8. Definition of case - 1 • Eligibility • Definition of disease 8
  9. 9. Definition of case - 2 • Eligibility (2 components) – Objective criteria for diagnosis – Stating the eligibility criteria • Eligibility criteria – should reflect ‘potentially at risk for exposure’ both for the case & control – Eg. recent OCP and MI; (sterilized, postmenopausal, CIs to OCPs) 9
  10. 10. Definition of case - 3 • Cases sh. have reasonable possibility of having had their disease induced by the exposure – Eg. OCP and Thromboembolism – sh. exclude postpartum and postoperative cases (Why?) • Incident cases – – – – Will be more uniform Recall more accurate More certain that exposure preceded the disease Berkson and Neyman 10
  11. 11. Definition of case - 4 • Definition of disease – Objective criteria to reliably diagnose the disease – Eg. Rh. Arthritis (several diagnostic criteria causing confusion) – To reduce misclassification • Sources of cases – Hospital lists, special reporting systems like cancer registries, disease surveillance, death certificates 11
  12. 12. Definition of control - 1 • Eligibility criteria – Sh. be similar to the cases with regard to potential for exposure – Problems arise in hospital based controls • We want to select controls that are likely to reflect the exposure rate in the population • We sh. exclude those hospital controls whose condition is associated with the exposure (Eg. Aspirin and MI; controls with chronic pain/peptic ulcer) – One solution – include controls with a variety of diagnoses not associated with exposure 12
  13. 13. Definition of control - 2 • Sources of controls – Hospital based – Dead controls – Controls with similar diseases – Neighborhood controls • Population based • Best friend control/ Sibling control 13
  14. 14. Definition of control - 3 • Hospital based – Referral pattern is similar to cases (form the same study base) – Similar quality of information – Convenience – May not be representative of the population • Dead controls – In a study where the case is death from a particular cause – Information obtained from ‘proxy’ informants – But dead controls differ from living controls • Controls with similar diseases – Cancer (of different type) controls for cancer cases – Minimize recall bias, interviewer bias, examine specificity of exposure 14
  15. 15. Definition of control - 4 • Neighborhood controls – Best friend control/Sibling control • Inexpensive, easy and quick • Ability to match on a number of variables that are associated with neighborhood/friendship • May introduce selection bias (‘smoking’ cases nominate ‘smoking’ friends) related to the exposure and overmatching – Population based • Truly representative sample • From tax lists, voting lists, telephone directories 15
  16. 16. Definition of control - 5 Source Advantage Disadvantage Hospital based Easily identified. Available for interview. More willing to cooperate. Tend to give complete and accurate information ( recall bias). Not typical of general population. Possess more risk factors for disease. Some diseases may share risk factors with disease under study. Berkesonian bias Population based Most representative of the general population. Generally healthy. Time, money, energy. Opportunity of exposure may not be same as that of cases. (location, occup.) Neighbourhood controls/ Telephone exchange random dialing Controls and cases similar in residence. Easier than sampling the population. Non cooperation. Not representative of general population. Best friend control/ Sibling control Accessible, Cooperative. Similar to cases in most aspects. Overmatching. 16
  17. 17. Selection process - 1 Total population Reference population cases controls 17
  18. 18. Selection process - 2 • Cases – In practice; we use all eligible cases within a defined time period • From disease registry or hospital • We are implicitly sampling from a subset of total population of cases • Controls – Sampling is most pertinent here because in rare diseases, the no. of controls greatly exceed no. of cases 18
  19. 19. Selection of cases - 1 • Representativeness – Ideally, cases sh. be a random sample of all cases of interest in the source population (e.g. from vital data, registry data) – But commonly they are a selection of available cases from a medical care facility. (e.g. from hospitals, clinics) • Method of Selection – Selection may be from incident or prevalent cases – Incident cases are those derived from ongoing ascertainment of cases over time – Prevalent cases are derived from a cross-sectional survey 19
  20. 20. Selection of cases - 2 • Incident cases are more optimal • These should be all newly diagnosed cases over a given period of time in a defined population. (However we are excluding patients who died before diagnosis) • Prevalent cases do not include patients with a short course of disease (patients who recovered early and those who died will not be included) • Can be partly overcome by including deceased cases as well as those alive 20
  21. 21. Selection of cases - 3 • Validity is more important than generalizability i.e. the need to establish an etiologic relationship is more important than to generalise results to the population • Eg. – In a study on breast cancer – we can include all cases or we can include only premenopausal women with lobular cancer • If we take the later group as cases; we can elicit the etiology better – Studies done in nurses for OCP use 21
  22. 22. Selection of controls - 1 • The four principals of Wacholder 1. The study base 2. De-confounding 3. Comparable accuracy 4. Efficiency 22
  23. 23. Selection of controls - 2 • Should the controls be similar to the cases in all respects other than having the disease? i.e. comparable • Should the controls be representative of all non-diseased people in the population from which the cases are selected? i.e. representative 23
  24. 24. Selection of controls - 3 • Representativeness – Sh. be representative of the general population in terms of probability of exposure to the risk factor • Comparability – Sh. also have had the same opportunity to be exposed as the cases have • Not that both cases and controls are equally exposed; but only that they have had the same opportunity for exposure. 24
  25. 25. Selection of controls - 4 • Usually, cases are not a random sample of all cases in the population. So, the controls must be selected in the same way (and with the same biases) as the cases. • If follows from the above, that a pool of potential controls must be defined. This is a universe of people from whom controls may be selected (study base) 25
  26. 26. Selection of controls - 5 • The study base is composed of a population at risk of exposure over a period • Cases emerge within a study base. Controls should also emerge from the same study base, except that they are not cases. • Eg. If cases are selected exclusively from hospitalized patients, controls must also be selected from hospitalized patients. 26
  27. 27. Selection of controls - 6 • Comparability is more important than representativeness in the selection of controls • The control should resemble the case in all respects except for the presence of disease 27
  28. 28. Selection of controls - 7 • Number of controls – Large study; equal numbers – Small study; multiple controls • Use of multiple controls – Controls of same type – Multiple controls of different types • Hospital and neighborhood controls • e.g. case - children with brain tumor, controlchildren with other cancer, normal children 28
  29. 29. Selection of controls - 8 Children with brain tumors Children with other cancers Children without cancer Exposure to radiation Radiation causes cancers Radiation causes brain cancers only Multiple controls of different types are valuable for exploring alternate hypothesis & for taking into account possible potential recall bias. 29
  30. 30. Sampling for cases/controls - 1 • Frame – list of all potentially eligible cases and controls in the target population (a subset of the general pop. both at risk of exposure and disease development) • The frame sh. not be biased in any manner, else the sample will also be biased even if random • Types of sampling • • • • SRS Systematic Stratified Matched • The objective is to avoid bias in selection, each case or control has equal chance of being selected 30
  31. 31. Sampling for cases/controls - 2 • If we are using all incident cases occurring in a defined area and time period, then controls selected at random from the gen. pop. is the best choice (sound basis for calculating RR, AR, etiologic fraction) • If cases are selected from hospital(s), it is not necessary that population controls are the only best choice, a valid control series from hospital can be valid • However, hosp. controls often leave room for doubt about validity of comparison (cost and practicality) 31
  32. 32. Sampling for cases/controls - 3 • Random digit dialing – Prerequisite; extensive telephone coverage – Either screen for potential controls/telephone interviews • Method – – – – All area codes and prefix numbers are obtained Add all possible two digit numbers The first 8 digits – PSU Select a PSU at random – if response obtained then retain PSU – Then the last two digits are randomly selected and continued until required sample is reached – The no. of PSUs and total houses depend on design 32
  33. 33. Sampling for cases/controls - 4 • Examples – Artificial sweeteners and bladder cancer • Cases; 21-84 years, newly diagnosed bladder cancer in 10 designated counties in metropolitan areas • Controls; age-sex stratified random sample of the general populations in the ten counties frequency matched at 2:1 ratio – Oral contraception and congenital malformations • Cases of malformation from all newborns and stillborns delivered at five major hospitals bet 1974-76 • Controls; all unaffected newborns in the 5 five hospitals, sampling days were rotated to represent all 7 days 33
  34. 34. Matching - 1 • Matching is defined as the process of selecting controls so that they are similar to cases in certain characteristics such as age, sex, race, socioeconomic status and occupation • What is post-matching? – Pairing controls to cases from an unmatched data during analysis • We often want a constant case control ratio, but sometimes matching is incomplete so that we end up with a variable ratio 34
  35. 35. Matching - 2 • Objective – to eliminate biased comparison between cases and controls • Two step process 1. The matched design 2. The matched analysis • • One immediate effect of matching is the balance between no. of cases and controls Sometimes we can deliberately match on a factor which comes in the casual path to confirm or refute its role. (Eg. Smoking and MI, matched on cholesterol) 35
  36. 36. Matching - 3 • What variables to match? – Factors which are independent risk factors for the disease – Assoc. with the exposure but non-causally – May not be directly a risk factor, but may be assoc. with other casual factors excluding the study exposure • Similar to something? 36
  37. 37. Matching - 4 • Situations to match or not? • Casual Non-causal 37
  38. 38. Matching - 5 • Examples • 1. E = alcohol F = smoking D = lung CA – Implication if not matched? • 2. E = OCP F = smoking D = MI – Implication if not matched? • 3. E = blood grp O F = age, sex D = thrombosis – Implication if matched? • 2. E = OCP F = prescribing physician D = MI – Implication if not matched or matched? 38
  39. 39. Matching - 6 • In summary, the decision to match or not depends on the residual association of the factor with disease and exposure after controlling other variables • Overmatching – Reduces validity or statistical efficiency – Two general meanings • Unmatched analysis in matched studies • Matching for unnecessary variables – If one matches on a factor that is associated with exposure but not the disease • Paired analysis may correctly estimate odds ratio but the variance will be more compared to an unmatched study of the sample (overmatching increases the frequency of exposure concordant pairs which are discarded in paired analysis) 39
  40. 40. Matching - 7 • If one matches a factor that is casually or non-causally assoc. with disease but not exposure then OR will be biased towards unity • If one matches a factor which is assoc. with disease but not exposure then OR will be correctly estimated whether or not pairing is retained or not – Paired analysis will be less efficient than unpaired one • Matching on highly correlated variable is also unnecessary • Finally, matching sh. be done for factors which have strongest relationship to the disease and are least correlated 40
  41. 41. Matching - 8 • Alternatives to matching – At the sampling phase • Stratified sampling • Frequency matching – At analysis phase • Post-stratification • Regression analysis • Stratified sampling • Pre-determined number of cases and controls in each subgroup created by the cross-classification • Eg. Age (4 groups), sex (2), race (4 groups) – Total 32 subgroups 41
  42. 42. Matching - 9 • Frequency matching • Controls being taken from the corresponding subgroups in proportion to the no. of cases – – • Eg. If 30% of cases are males of Hindu religion in 60-65 years then we take 30% of similar controls More practical than stratified sampling but it requires one to continually update on the distribution of accumulating cases to maintain a fixed case-control ratio Post-stratification • • • Stratify the subgroups and analyze Very flexible in that variables need not be pre-specified Limitation - the number of variables that can be stratified due to lack of numbers • Regression analysis – Most useful when the number of variables/subgroups increase 42
  43. 43. Matching - 10 • Effectiveness of matching – Removal of bias – Reduction of variance • Matched design only gives a modest increase in efficiency • Greatest improvement is when there is strong assoc. between disease and the confounder • Also efficient when only a small proportion of the target population is exposed to the study factor • The added cost and complexity of matching should be weighed against any expected gains in precision 43
  44. 44. Matching - 11 • Advantages – Cases and controls will be comparable to the matched variables – Provides the best means to investigate a very specific hypothesis • Disadvantages – One can no longer study the matched variable in relation to the risk of disease – Increase in cost, time and labor – A certain fraction of cases are discarded as a result of failure to find a matching control 44
  45. 45. Matching - 12 • Summary – Unless one has very good reason to match, one is better off avoiding it – Frequency matching within rather broad categories of the matching variables will suffice for most studies 45
  46. 46. Sources of bias - 1 • Bias – systematic error in the design, conduct, or analysis of a study that results in a mistaken estimate of the risk measure 1. Ascertainment and selection bias a) b) c) d) e) f) g) h) Surveillance Diagnosis Referral Selection Non-response Length of stay Survival Admission diagnoses 2. Bias in estimation of exposure a) b) c) d) Recall Interviewer Prevarication Improper analysis 3. Misclassification 4. Other sources 46
  47. 47. Sources of bias - 2 1. Ascertainment and selection bias - Not peculiar to case-control, can occur in cohort studies also a) Differential Surveillance – In asymptomatic/mild diseases , cases are more likely to be detected in persons who are closely examined – Eg. OPC and endometrial cancer/phlebitis • Women taking OCPs were more thoroughly evaluated • Based on preliminary reports of OCP use and phlebitis, clinicians started looking for phlebitis in such exposed patients – Exposed cases would have a greater likelihood of being diagnosed as compared to unexposed cases – This bias can be checked by doing a stratified analysis in subgroups having equal surveillance (based on some index of medical care) or restrict the study to time prior to 47 publication of such finding
  48. 48. Sources of bias - 3 1. Ascertainment and selection bias b) Diagnosis • In conditions like cervical dysplasia, knowledge of exposure may alter the assessment • This is most likely to occur in cases of uncertain diagnosis c) Differential Referral • OR’ = bOR; b = (s1s4)/(s2s3); – – – Where s1, s2, s3, s4 are the proportions of exposed and unexposed cases and controls resp. A biased selection of cases will be compensated by biased selection of controls also The probability of selecting exposed case = unexposed case, and likewise for control 48
  49. 49. Sources of bias - 4 1. Ascertainment and selection bias c) Differential Referral (cont.) • Eg. • A study of Alcohol and kidney failure, and income is assoc. with alcohol intake A Hospital only admits wealthy patients, so cases of kidney failure in this hospital will be more exposed to alcohol than patients in the gen. pop. But if patients with other diseases also have similar income characters and they were taken as controls, bias won’t occur If controls are taken form gen. pop. then we have to match/stratify income to eliminate income as a source of selection bias 49 • • •
  50. 50. Sources of bias - 5 1. Ascertainment and selection bias d) Selection • • Eg. Interviewer ‘keying’ on cases who are exposed (one particular nurse was searching out all the cases of ectopic pregnancy with IUD usage) To avoid this, we must precisely and in advance the methods by which cases and controls are selected, carefully train staff, quality control e) Non-Response • • a worst case analysis taking all non-responding cases as unexposed and all non-responding controls as exposed will show if the non-response is likely to bias the estimates if the exposure rates were equal between responders and non-responders, there will be no bias 50
  51. 51. Sources of bias - 6 1. Ascertainment and selection bias f) Length of stay • In hospital study – incident cases sh. be selected rather than prevalent cases otherwise, – Patients who stay longer will have more probability of being selected – Cases of short duration would be under represented • We check this by stratifying the analysis on the basis of the duration b/w admission and selection g) Survival • In a situation where disease accompanied by mortality is studied only in survivors • Eg. A study in survivors of MI may reveal factors that are assoc. with surviving an MI rather than sustaining one • Unless one can justify that exposure is not related to duration/survival one sh. take only incident cases • This bias can be checked by stratifying date of onset 51
  52. 52. Sources of bias - 7 1. Ascertainment and selection bias h) Admission diagnoses • Eg. In hospital based study – assoc. b/w smoking and MI, if controls are lung cancer patients; this will underestimate the effect • To avoid this bias we must select controls with a variety of diseases which are believed to be unrelated with study exposure (neither + nor -) 2. Bias in the estimation of exposure a) Recall • Eg. A mother with malformed baby will try with more care and intensity to recall a pelvic X-ray compared to women with normal baby 52
  53. 53. Sources of bias - 8 2. Bias in the estimation of exposure a) Recall (cont.) • Sometimes, the disease itself affects memory (dementia) • This bias can be reduced by using controls with another disease who will also keep thinking of reasons for their disease • Independent verification of h/o exposure can be sought b) Interviewer • Interviewer may probe cases more intensely for histories of exposure than in controls if they know the hypothesis • Reduced by training staff, keeping staff ignorant of hypothesis (ideal but unobtainable), keeping interview time constant c) Prevarication • Subjects may have ulterior motives for deliberately overestimating or underestimating exposure • Eg. A worker who may receive disability pay may exaggerate his exposure; if it means loss of job, he may minimize it • May be overcome by several independent raters 53
  54. 54. Sources of bias - 9 2. Bias in the estimation of exposure d) Improper analysis • Unmatched analysis for a matched study 3. Misclassification – The disease/exposure status classification may be erroneous – Some controls may actually have the study disease but this is very improbable with rare diseases – The most likely source of misclassification will occur in the determination of exposure – Any measure to reduce misclassification sh. addressed at the design stage, a pilot study will reveal many errors 4. Other sources of error – Insufficient sample size, errors of interpretation, not accounting for effect of extraneous variables 54
  55. 55. Sources of bias - 10 4. Other sources of error – Cases and controls sh. be similar with respect to factors that might have affected both the development of disease and the opportunity for past exposure – For eg. Medical conditions like HTN, DM preclude the use of OCPs, thus users of these would inherently be at a lower risk – An agent found in assoc. with study disease was prescribed due to an early manifestation of the disease – For eg. Estrogens prescribed for irregular bleeding that was the first symptom of undetected endometrial cancer. If this was the case then later diagnosis of the cancer would find an apparent assoc. with estrogen usage. 55
  56. 56. Sources of bias - 11 Summary – Before starting a study, one should list the likely sources of bias and plan the investigation and analyses so as to prevent/minimize them 56
  57. 57. Specific limitations of Case control study • Is not useful to study weak associations (OR < 1.5) • Non-participation rates are freq. low and differential for cases and controls • Differential recall bias 57
  58. 58. Applications of Case control study 1. Vaccine effectiveness 2. Evaluation of treatment and program efficacy 3. Evaluation of screening programs 4. Outbreak investigations 5. Demography 6. Genetic epidemiology 7. Occupational epidemiology 58
  59. 59. Pertussis vacc. in UK - 1 Year Event(s) 1906 Bordet and Gengou of the Pasteur Institute grow the pertussis bacterium in artificial media 1912-14 Pert. vaccine used by many researchers Next few Many versions of vaccine developed years 1942 Several local authorities in UK start vaccine 1947-48 First published reports appear of irreversible brain damage after wholecell pertussis vaccine 1957 85,000 cases of pertussis reported Vaccination magnified to national scale 1975 Cases came down to 8,900 Pertussis incidence peaks every 4 years The peaks became smaller and smaller, the smallest was in 1974-75 The next peak at 1978 should have been the smallest, but was it? 59
  60. 60. Pertussis vacc. in UK - 2 60
  61. 61. Pertussis vacc. in UK - 3 Year Event(s) 1974-75 Adverse publicity by media about the side effects of pert. vaccine Parents and doctors hesitated to give vaccine 1976-79 National Childhood Encephalopathy Study (NCES) commissioned by the Dept. of Health and Social Security 1974 Vaccine acceptance rate came down (from 78% in 1971) to 37% 1977-79 An epidemic of pertussis occurs in Great Britain. > 100,000 cases and 36 deaths 1979 Vaccine Damage Payment Act passed in Great Britain. The act provides a mechanism for government compensation to those with vaccine-associated injuries 61
  62. 62. Pertussis vacc. in UK - 4 • Findings of the NCES study; • Attributable risk – – Serious neurological disorders = 1 in 1,10,000 injs. – Persistent neurological sequelae = 1 in 3,10,000 injs. 62
  63. 63. Pertussis vacc. in UK - 5 Year Event(s) 1982 British Child Health and Education Study Long-term neurologic problems are not found to be related to pertussis immunizations. 1983 Communicable Diseases Surveillance Centre Study, or North West Thames Study, followed a large group of children after pertussis vaccination, finds no convincing evidence relating DPT vaccine to neurologic damage. 1988 Loveday judgment in Great Britain's High Court rules that there is insufficient evidence to demonstrate that pertussis vaccine can cause permanent brain damage. Considered as "test case" meaning that other lawsuits claiming permanent neurologic effects from pertussis vaccine are effectively excluded. 63
  64. 64. Pertussis vacc. in UK - 6 1990Happy ending? 64
  65. 65. Critical Appraisal of NCES - 1 • Research question – Intended and actual • Study design – Case control – reasons for choosing – Cohort – reasons for not choosing • Case selection 65
  66. 66. Critical Appraisal of NCES - 2 • Only hospital admitted cases were selected as cases – any comments? • Control selection – Comments? 66
  67. 67. Critical Appraisal of NCES - 3 • Exposure measurement 67
  68. 68. Critical Appraisal of NCES - 4 • Results • There was no noticeable clustering in any area 68
  69. 69. Critical Appraisal of NCES - 5 • Results • 3.5% of cases and 1.7% of controls had been immunized • OR 0f 2.4, p value < 0.001 69
  70. 70. Critical Appraisal of NCES - 6 • Results • There was no significant association between serious neurological illness and diphtheria and tetanus vaccine • Confounders • History of fits – Is a known contraindication to immunization, including such cases will underestimate OR, – A separate analysis limited to normal children with no past history of fits gave a RR of 3.2 • Social class – Could not be controlled – But analysis in those pairs of children in which both the affected and control were of the same social class – no differences 70
  71. 71. Critical Appraisal of NCES - 7 • Causation Vs. association A) clinically distinctive B) restricted to immunized children C) closely related in time to immunization D) biologically plausible E) without alternative explanation • Attributable risk – Can this be calculated in a case control study? – Covered an entire national population (in theory represents the total incidence of serious neurological illnesses, assumption about immunization coverage) – Serious neurological disorders = 1 in 1,10,000 injs. – Persistent neurological sequelae = 1 in 3,10,000 injs. – Is this appropriate? 71
  72. 72. Thank you 72