Your SlideShare is downloading. ×
Barbara Osimani, Problems with Evidence of Pharmaceutical Harm. King's College London, Department of Philosophy CHH - Concepts of Health Seminar 14 May 2013
Upcoming SlideShare
Loading in...5

Thanks for flagging this SlideShare!

Oops! An error has occurred.


Introducing the official SlideShare app

Stunning, full-screen experience for iPhone and Android

Text the download link to your phone

Standard text messaging rates apply

Barbara Osimani, Problems with Evidence of Pharmaceutical Harm. King's College London, Department of Philosophy CHH - Concepts of Health Seminar 14 May 2013


Published on

Published in: Health & Medicine, Spiritual

  • Be the first to comment

  • Be the first to like this

No Downloads
Total Views
On Slideshare
From Embeds
Number of Embeds
Embeds 0
No embeds

Report content
Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

No notes for slide


  • 1. Problems with evidence for pharmaceutical harm Barbara Osimani University of Camerino KCL London, 14 May 2013
  • 2. Topics • Philosophical debate on evidence hierarchies and RCTs • Epistemological rationale underpinning evidence hierarchies and alternative approaches (Principle of total evidence) • Distinctive roles in causal assessment of intended vs. unintended effects. • Case study (acetaminophen/paracetamol side effects)
  • 3. Evidence hierarchies: best evidence for clinical decision and health policies 1. Meta-analyses of RCTs 2. Single RCTs 3. Meta-analyses of observational studies 4. Comparative studies which are not randomized (e.g. cohort or case-control studies), 5. Reasoning about pathophysiologic mechanisms 6. Expert judgment
  • 4. The problem of confounders
  • 5. The problem of confounders If you have a big enough sample you might discount spurious correlations by “controlling” for specific variables: For instance you might control whether the correlation still holds if you compare a group of people who regularly excercise and take vitamin C with a group of people who do excersise but do not take vitamin C. P(F/C & E) VS. P(F/¬ C & E); If the rate of flu incidence is still different in the two groups (in a statistically significant measure), then Vitamin C might be considered to bring a distinctive contribution to dicrese flu rate.
  • 6. The problem of confounders • If people who take vitamins generally also have a healthier lifestyle (they have healthier eating habits, they are less likely to smoke and they practice sport regularly), than the difference in specific health indicators (such as for instance the frequency of infections incurred in a defined time lapse) could be due not to vitamins intake but to the other concomitant factors. • This phenomenon is called self-selection bias, because it refers to the fact that the group of people who take the treatment is biased by the very fact that they choose to take it. Health behaviour Vitamin C intake Excercise Healthy diet Non-smoker Flu frequency
  • 7. Controlling for confounders Vitamin C intake Excercise Healthy diet Non-smoker ¬Vitamin C intake Excercise Healthy diet Non-smoker If the outcome difference is statistically significant, then Vitamin C intake is considered to bring a distinctive contribution to the reduced frequency of flu. However, what about other possible causes which we know nothing of and could as well make Vitamin C seem to be causal whereas it is not?
  • 8. Adjusting for potential confounders is not possible for unknown confounders, i.e. for causal factors about which the researcher is unaware. Hence randomization should warrant that the causal link between health behaviour and vitamin intake is severed, by making vitamin intake independent from health behaviour and its effects.
  • 9. Putative functions of randomization • 1) balance between treatment and control group: this allows to experimentally isolate the (distinctive contribution of) the cause under investigation from other prognostic factors (confounders); • 2) repeated randomization of the treatment among the subjects in the sample, allows to approach in the limit (in the long run) the true mean difference between treated and untreated sample population (see Basu 1980, and Teira 2011) = true effect size; • 3) Randomization as an aid against (self-)selection bias.
  • 10. Randomization works together with: 1. control (partition of the sample into treatment and control group/s) 2. intervention (treatment administration by the experimenter), and 3. double-blinding and placebo (concealment of treatment allocation from subjects and researchers).
  • 11. 1. Philosophers’ skepticism against RCTs and evidence hierarchies in general 2. Rationale underpinning ranking approach 3. Alternative approaches 4. Justification of “lower level evidence” in different approaches 5. Distinctive advantages of alternative approaches when dealing with unintended effects
  • 12. Worral’s critique 1) Clinical researchers never randomize forever, so RCTs do not reflect the “limiting average” 2) “no sense in which we can ever know how close a particular RCT is to yielding this ‘limiting average” (2007: 15); 3) Repeated randomization is, epistemically speaking, impossible: “If a particular patient in the study receives, say, the ‘active drug’ on the first round, then since this is expected to have some effect on his or her condition, the second randomization would not be rigorously a repetition of the first. The second trial population, though consisting of the same individuals, would, in a possibly epistemically significant sense, not be the same population as took part in the initial trial” (2007: 22)
  • 13. Worral’s critique 4) allowing sufficient “wash out” times between the rounds does not represent a perfect warrant against “contamination”, 5) repeated randomization is practically and ethically unfeasible 6) randomization is only a means to the end of balancing the experimental groups and this aim can be reached also through other tools such as deliberate matching and “haphazard” allocation; 7) strictly speaking, it is not randomization but rather masking treatment allocation, which wards off bias due to experimenters’ or subjects’ interests and expectations (allocation and self-selection bias); 8) Comparison of reliability of observational vs. randomized studies by taking the latter as the gold standard amounts to a petitio principii.
  • 14. Papineau Sampling error vs. confounders Worral’s worry about disproportionate representation of some possibly confounding factor in the treatment or control arm = worry about sampling error Bigger samples alleviate sampling error But they do not alleviate worry about confounders  randomization alleviate confounding But does not affect sampling error
  • 15. Papineau Equipose = spurious ethical justification from an equivocation on uncertainty It is enough to warrant an RCT that the medical community/ reasonable doctor is not certain that T is better than non T. 0 < P (U(T) > U(¬T)) < 1 But real equipose would require that they are indifferent on the balance of the probabilities: P(T)U(T) = P (¬T) U (¬T)
  • 16. Teira’s Defence: Impartiality through randomization • David Teira (2011) acknowledges the methodological limitations attributed to RCTs • However according to him, randomization “is still a warrant that the allocation was not done on purpose with a view to promoting somebody’s interests”. • Randomization serves the purpose to avoid that the uncertainty related to causal inference be advantageously exploited by one party or the other  impartiality.
  • 17. Nancy Cartwright on RCTs: the problem of extrapolation • Cartwright (2007) details the assumptions which should be met in order to export the claim of efficacy from the sample to the target population: • at least one causally homogeneous subgroup in the target population must have the same causal structure and probability measures of at least one causally homogeneous subpopulation in the experimental sample. • Thus the evidence provided even by an ideal (i.e. perfectly internally valid) RCT can be only with great caution extended to the target population. • Randomization is also not recommended for most practical purposes it is supposed to pay service to (see also Cartwright, 2010).
  • 18. Nancy Cartwright on RCTs: the problem of extrapolation RCTs test whether a given causal law works in a given study situation Ex: Y(u) c= a(u) + β(u)X(u) + W(u). Thus, granted that T =def < Y(u)/X(u) = xt> - < Y(u)/X(u) = xc> Then T = < a(u)/X(u) = xt> - < a(u)/X(u) = xc> + < β(u)/X(u) = xt> xt - < β(u)/X(u) = xc> xc + < W(u)/X(u) = xt> - < W(u)/X(u) = xc>.
  • 19. Nancy Cartwright on RCTs: the problem of extrapolation If, on grounds of random assignment we are prepared to assume that for the units in the study, X is probabilistically independent of a, β and W, then expectation of a, β, and W will be the same for any value of X; Thus: < a(u)/X(u) = xt> = < a(u)/X(u) = xc> ; < W(u)/X(u) = xt> = < W(u)/X(u) = xc>; < β(u)/X(u) = xt> = < β(u)/X(u) = xc> Hence the first and last addends cancel out < a(u)/X(u) = xt> - < a(u)/X(u) = xc> = 0 < W(u)/X(u) = xt> - < W(u)/X(u) = xc> = 0 And < β(u)/X(u) = xt> xt - < β(u)/X(u) = xc> xc = < β(u)> xt –xc So T = < β(u)> xt –xc
  • 20. Now, in order to use T = < β(u)> xt –xc to predict whether a given intervention will work in your target population you need to know: 1) Whether the same law holds in the target population; 2) whether at least some subgroups have the right set of support factors Nancy Cartwright on RCTs: the problem of extrapolation
  • 21. Nancy Cartwright on RCTs: the problem of extrapolation β does not represent a single factor but a complex function of further factors that together fix whether and how much X contributes to Y: β= f1 (z11, …, z1n) + … + fm (zm1, …, zmp) Hence, you will get the same result in target pop only if it shares the same mean value of β = i.e. the same distribution of different values of β, which represent different combinations of values for support factors.
  • 22. Nancy Cartwright on RCTs: the problem of extrapolation Indeed there might be different sets of complexily interacting factors anyone of which allows X to contribute to Y: Ex: Y(u) c= a(u) + β(u)X(u) + γ(u)X(u) - δ(u)X(u) + W(u). Or more generally: Y(u) c= C1 + … + Cn – P1 – … – Pm.
  • 23. Support factors for adverse reactions For instance you might have that together with β= f1 (z11, …, z1n) + f2 (z21, …, z2p) the drug produces an intended outcome Y, whereas with δ = f1 (z11, …, z1n) + f3 (z31, …, z3p) it produces an undesired outcome (adverse drug reaction) Q: Y(u) c= a(u) + β(u)X(u) + W(u). Q(u) c= k(u) + δ(u)X(u) + Z(u).
  • 24. Support factors for adverse reactions Indeed, the possibility of undesired outcomes might be even enhanced in the target population because of different anagraphical and clinical conditions (age, co-morbidity, multidrug therapy).  Post-marketing surveillance
  • 25. Evidence hierarchies do not differentiate between benefits and risks. So when it comes to evaluate evidence, observational data on side-effects tend to be discounted until they are not “prooved” by RCTs. • • GRADE (Guyatt et al.) • CEBM levels of evidence, which is at pains to distinguish between different hierarchies depending on different evaluation goals (therapy, prognosis, diagnosis, economic analysis), coalesce efficacy and harm assessment in one and the same column: therapy-prevention-etiology-harm, putting meta-analyses of RCTs, followed by single RCTs, at the top of the ranking. • Similarly, Guyatt and colleague’s Grade System (Guyatt et al. 2011) admit the difficulties inherent in the evaluation of evidence for harm, but propose a framework where its quality is assessed with the same criteria proposed for efficacy evaluation. Particularly, evidence for harm coming, say, from observational studies is given lower weight than evidence for efficacy coming, say, from RCTs thus biasing the overall risk-benefit assessment in favor of the drug.
  • 26. Nancy Cartwright on RCTs: the problem of extrapolation The problem with evidence guidelines: They are concentrated on 1) whether independence of X from other factors hold in the study situation (INTERNAL VALIDITY) but they fail to appreciate the importance of: 2) How to tell that law holds in target pop and what support factors can be assumed to be there; (RELEVANCE) 3) Whether the set of evidence we consider in support of H = total (no relevant facts = left out) (TOTAL EVIDENCE) 4) Evidence speak for truth of H (EVIDENTIAL SUPPORT)
  • 27. Clinchers vs. vouchers Clinchers = deductive methods of evaluating hypotheses  hypothesis testing Vouchers = non-deductive methods, where evidence is symptomatic for the conclusion but not sufficient for it. RCTs = belong to clinchers = statistic version of hypothetico-deductive method: a statistically significant result rejects the null Hypothesis that the treatment has no effect.
  • 28. Hypothetico-deductive method and modus tollens 1. Conjecture: H (Vitamin C has some effect on Flu) Experimental hypothesis: ¬H  ¬Δ 2. Test and observe result: Δ 3. Infer: ¬¬H (reject ¬H) p-value = probability of observing Δ if ¬H = true Very law prob speaks for rejecting ¬H.
  • 29. Hypothesis testing = H-D + Abduction Peircean abduction: Sthg surprising happened (let's call it S) But if H were the case, than S would be no longer surprising Hence we have good reasons to believe that H is the case. Hypothesis testing: the experiment result would be very unprobable if the Null Hypothesis were true hence either sthg very unprobable happened or the null hypothesis is false --> we reject the null hypothesis (on what grounds do we choose this second alternative?)
  • 30. Clinchers: Hypothesis testing (I) • The aim of hypothesis testing is to provide a means to reject hypotheses on the basis of statistical evidence. • In classical hypothesis-testing, the result is expressed as the probability of observing the experimental result – or more “extreme” results in the sample space – (p- value), if the treatment makes no difference (so called null Hypothesis: H0). • For the result to be at all meaningful, it is essential that the observed difference among groups is due to the treatment and only to it. Which in turn explains the insistence on the exclusion of confounders.
  • 31. Clinchers: Hypothesis testing (I) •  the more likely a method is to be able to exclude confounders (i.e. additional contributing factors to the observed result), the more reliable is the inference we base on it •   the higher is the method ranked in the hierarchy (the better the evidence); • Corollary: case reports and observational data are considered sufficient evidence for causal claims only to the extent that possible confounders can be confidently excluded.
  • 32. Clinchers: conclusive evidence without randomization • Glasziou et al. (2007) for instance consider cases where the relation between treatment and effect is so dramatic that bias and confounding can be safely excluded even in the absence of randomization. • Howick et al. (2009) relax the requirement of dramatic effect and reduce it to the desideratum that the effect size be greater than the combined effect of plausible confounders. • Vandenbroucke (2008) ascribes RCTs and observational studies the same reliability when assessing harm, on grounds that ignorance about the unexpected consequences of an intervention achieves the same lack of bias obtained through blinding (i.e. ignorance about whom will receive the treatment).
  • 33. Vouchers (inductive methods – Bayesian epistemology) • Distinctive points between the inductive-bayesian framework and classical hypothesis testing: • Hypothesis testing = hypotheses are formulated and then tested for rejection/acceptance. • Bayesian updating = hypotheses are assigned a probability, which is then updated in light of data. Also, evidence is interpreted in light of all possible alternative hypotheses. Probability measures specify the degree of support enjoyed by hypotheses.
  • 34. Vouchers (abduction) • Instead of experimentally isolating the causal factor under investigation, different pieces of evidential facts are put together and the implication of their joint occurrence is then inferred. • Rather than filtering evidence by ranking it, this approach aims to accommodate all data in a unifying picture. It is more or less knowingly advocated by different authors: • Aronson and Hauben (2006): “In some cases other types of evidence may be more useful than a randomised controlled trial. And combining randomised trials with observational studies and case series can sometimes yield information that is not available from randomised trials alone” (my emphasis). • Howick et al. (2009) and Stegenga (2011) propose to integrate evidence hierarchies with Bradford-Hill criteria for causal inference: Bradford-Hill criteria are not meant as truth conditions for causality but rather as imperfect indicators which jointly support the hypothesis of causation.
  • 35. Bradford Hill criteria for causal assessment 1. Consistency of data within population / across populations; 2. Strength of the association; 3. Relationship in time; 4. Biological gradient; 5. Specificity; 6. Coherence of evidence; 7. Biological plausibility; 8. Reasoning by analogy; 9. Experimental evidence.
  • 36. Bradford Hill criteria for causal assessment 1. “None of my nine viewpoints can bring indisputable evidence for or against the cause-and-effect hypothesis and non can be required as a sine qua non. What they can do, with greater or less strength, is to help us make up our minds in the fundamental question – is there any other way of explaining the set of facts before us, is there any other equally, or more, likely than cause and effect?” 2. Thus, Bradford Hill both refers to explanatory power and likelihood as reliable grounds to justify causal judgments, and explicitly presents his approach as an alternative to hypothesis testing: 3. “No formal tests of significance can answer those questions. Such tests can, and should, remind us of the effects that the play of chance can create, and they will instruct us on the likely magnitude of those effects. Beyond that, they contribute nothing to the proof of our hypothesis”.
  • 37. Epistemology Method Main assumptions Justification of “lower level” evidence Hypothetico- deductive (statistical mode) Hypothesis testing: likelihood of evidence if H0 = true (p-value) Investigated factor is isolated by balancing the experimental groups as to all other prognostic factors Only if alternative explanations for the observed result (confounders) can be safely excluded, or treatment effect swamps them by a statistically significant amount. Abduction Connection of data in light of explanatory hypothesis Account for as much evidence as possible Explanatory power of hypothesis in light of data. Inductive- Bayesian Bayes theorem Principle of total evidence – coherence Probability of hypothesis given likelihood function and prior.
  • 38. Principle of total evidence • The essential distinction between clinchers and vouchers is that the latter are guided by the idea that all relevant evidence – or as much data as possible in the case of abduction – should be taken into account in order for the inferential procedure to be valid.
  • 39. Principle of total evidence The principle of total evidence has been a topic of hot debate among philosophers such as Hempel, Carnap, Ayer, Braithwaite, and Kneale among others. Keynes (1921) traces back the origin of the principle of total evidence to Bernoulli’s maxim that: “in reckoning a probability, we must take into account all the information which we have” (Carnap, 1947: 138, footnote 10; Keynes, 1921: 313).
  • 40. Principle of total evidence – nonmonotonic logic • Induction/abduction can be characterized as an inference where the evidence does not entail the hypothesis, but only more or less strongly supports/undermines it (Ayer, 1956). • Thus, whereas in deductive inference, additional evidence can neither add further support nor disconfirm a hypothesis when conclusive evidence is already available; not so in the case of induction.
  • 41. Principle of total evidence – nonmonotonic logic • A doctor thinks that a patient is celiac, because all his available evidence E (adverse reactions to certain foods, iron deficiency, a series of additional symptomatic phenomena) points to this diagnosis. • H  E • However he cannot be sure that this evidence necessary entails his diagnosis: ¬ (E  H). • Thus he prescribes a series of serum tests and they all result negative (evidence F). •  the strong support to the diagnosis of celiac disease provided by E is “corroded” by the negative evidence F and the doctor needs to look for a hypothesis which accounts for both E and F: for instance a simple food intolerance.
  • 42. Probabilistic consequences of entailment relation If H E, then P(H/E) > P(H) where P(H)> 0 and P(E) <1 Proof (from Howson and Urbach 2006): P(H/E) = P(H&E)/P(E) > P(H&E) Hence P(H/E) > P(H&E) = P(E/H) x P(H) So: P(H/E) > P(E/H) x P(H) But P(E/H) = 1 So P(H/E) > P(H) QDE
  • 43. Principle of total evidence – nonmonotonic logic Inconclusive evidence is used to assess the plausibility of a hypothesis and to possibly quantify it in a probabilistic fashion, so that, for instance P(H/E) = .9; but there may always be additional information F, which may lower this support, so that, for instance, P (H/E&F) = .2. Nonmonotonicity = conclusions of inductive inferences are contingent and may be invalidated by additional information (Kyburg & Teng 2001). This means that “acquired support” may get lost if additional information undermines it.
  • 44. Principle of total evidence – nonmonotonic logic Deductive inference: conclusive evidence Inductive inference: inconclusive evidence Modus ponens E  H E . H No other additional evidence can change the conclusion. If, in addition to E, you come to know F, you always have H as a conclusion: E  H E, F, ...: H When E represents non-conclusive evidence for H, there may always be the possibility that P(H/E) > P(H), and that additional evidence F might reverse this inequality thus leading to the following result: P (H/E) < P(H/E,F) The bearing of this phenomenon is most evident when comparing the strength of support provided by the evidence to the hypothesis H and its complement (¬H). So that you may have: P(H/E) > P(¬H/E) And, after learning F: P(H/E,F) < P (¬H/E,F). The same is valid for modus tollens: H  E ¬ E . ¬H No additional evidence would change this conclusion.
  • 45. • Statistical hypothesis-testing is a kind of approach which admittedly follows a Popperian hypothetico-deductive method of scientific enquiry.  it does not feel urged to address the issue of non-monotonicity:  Once you have conclusive evidence E rejecting hypothesis H, any other piece of evidence becomes irrelevant.  Thus the closer the evidence gets to this deductive ideal, the better: best evidence means evidence which gives you the guarantee that the observed difference is due to the treatment and only to it = internal validity maximization.  Evidence hierarchies are grounded on the assumption that if you have a study which has the capacity to eliminate more confounders than others, then the former should trump the latter.
  • 46. • Trumping means that higher level evidence discards any evidence of inferior ranking, and also makes it irrelevant. • Lexicographic rule: when two studies of different levels deliver contradictory findings, then the higher in the evidence hierarchy is considered more reliable and is allowed to discard the lower level one; furthermore, lower level evidence adds nothing to higher level one and thus it can be neglected without loss of information. • More generally, the very idea of ranking or up- and downgrading evidence on the basis of its internal validity is at the opposite side of a unifying approach which aims to account for all the evidence at disposal. In fact, non-deductive approaches must take into account all available evidence, because no matter how much a piece of evidence supports a given hypothesis, the possibility of defeating evidence is always possible.
  • 47. Harm – benefit distinction • Recent contributions by philosophers and health scientists have acknowledged the role of so called "lower level" evidence as a valid source of information contributory to assessing the risk profile of medications both on theoretical (Aronson and Hauben, 2006; Howick et al. 2009) and on empirical grounds (Benson and Hartz, 2000; Golder et al. 2011). • Nevertheless current practices have difficulty in assigning a precise epistemic status to this kind of evidence and in amalgamating it with standard methods of hypothesis testing.
  • 48. In their comparative analysis of RCTs and observational studies, Papanikolau et al. (2006) assert: “it may be unfair to invoke bias and confounding to discredit observational studies as a source of evidence on harms” (p. 640, my emphasis).
  • 49. 1. Different epistemologies may justify “lower level” evidence on different grounds (Vandenbroucke’s proposal to reverse rankings); 2. In the case of risk detection and assessment non-deductive epistemologies (“Vouchers”) are better suited to the purpose.
  • 50. Hierarchy reversal for risk assessment Vandenbroucke J.P. (2008) Observational Research, Randomised Trials, and Two Views of Medical Science. Plos Medicine, 5 (3): 339-43 Hierarchy of study designs for intended effects of therapy Hierarchy of study designs for discovery and explanation i. Randomised controlled trials i. Anecdotal: case report and series, findings in data, literature ii. Prospective follow-up studies ii. Case-control studies iii. Retrospective follow-up studies iii. Retrospective follow-up studies iv. Case-control studies iv. Prospective follow-up studies v. Anecdotal: case report and series v. Randomised controlled trials
  • 51. Vandenbroucke’s defence of hierarchy reversal (I) 1. Methodological point: Observational studies concerning adverse reactions will not suffer from confounding in the same way as observational studies for intended effects do. selection bias is less likely to affect observational studies with respect to adverse reactions. This because unintended effects, qua unintended, are not known in advance, and thus also not known by the drug prescriber, who cannot take them into consideration and thus bias treatment allocation. Ignorance of possible effect = “natural masking”
  • 52. Vandenbroucke’s defence of hierarchy reversal (II) 2. Epistemological point: Context of discovery vs. context of evaluation: Discovery is focused on explanation and hypothesis generation; Evaluation instead on hypothesis testing/confirmation. And research methods differ in the opportunities they offer with respect to either of these goals.
  • 53. Vandenbroucke’s defence of hierarchy reversal (III) • Vandenbroucke (2008) formalizes the contrast between the context of evaluation and the context of discovery in terms of different priors assigned to hypotheses of benefits and of adverse reactions. • High priors for intended effects • Low priors for unintended ones
  • 54. Vandenbroucke’s defence of hierarchy reversal (III) 1. It is the higher priors which make the results more robust, not the method (Vandenbroucke, 2008: 16-17). 2. The reason why we accept uncertain results for risks rather than for benefits is that evaluation and discovery studies are associated with different loss functions: 1. evaluation is related to the approval of health technologies and is required to assure stakeholders about their efficacy and safety, 2. whereas discovery is more related to the context of research for its own sake, which might explain why certain study designs are preferred to others in different circumstances.
  • 55. Vandenbroucke’s defence of hierarchy reversal (III) 1. Priors are quickly swamped by data 2. Stakes are not lower for detecting risks than for testing the drug benefit: adverse drug reactions might be so severe as to reverse the safety profile of the drug and determine its withdrawal.
  • 56. Prior knowledge about drug’s general capacity to produce unintended adverse reactions • The acceptability of anecdotal evidence or of uncontrolled studies for assessing risk has to do with a high prior about the general capacity of the drug to bring about side-effects. • Whereas there is total ignorance as to some specific side effects which might be possibly caused by the drug, still there is almost certainty about the fact that the drug will indeed cause side-effects beyond the ones already detected in the pre-marketing phase. • This high prior derives from historical knowledge and past experience with pharmaceutical products and is also strongly reflected in the regulation which introduced the notion of “development (or potential) risk”, the pharmacosurveillance system, and the precautionary principle.
  • 57. Reasons for preferring vouchers to clinchers in causal assessment for harms 1. Explicit integration of prior knowledge 2. Categorical vs. probabilistic causal assessment 3. Internal vs. external validity 4. Impartiality
  • 58. Explicit integration of prior knowledge Frequentist statistics does not allow to incorporate priors in hypothesis evaluation. ---------------------------------------------------------------- 1. Knowledge of the drug behavior may be inferred analogically from same-class molecules or similar entities. 2. Theory 3. Historical knowledge about drugs harmfulness in general
  • 59. Categorical vs. probabilistic assessment of causality From the time a risk is not known, to the moment in which it is incontrovertibly proven to be causally associated with the drug, there is a period of evidence accumulation which constitutes a state of partial and imperfect (but continuously increasing) knowledge. In this period it cannot be claimed that there is a causal link between the drug and the detected risk; but neither can we behave as if we knew nothing about it. Still, the latter attitude is precisely the only possible policy allowed by an epistemology grounded on hypothesis rejection.
  • 60. Internal and external validity • The problem of external validity is a particularly delicate one when inferring efficacy in the real population of users (a.k.a effectiveness) from the efficacy assessed by RCTs. • In biological sciences such phenomena as feedback loops (homeostasis), interactive and multiple causality as well as threshold effects characterize causality in a very peculiar way (see for instance Joffe, 2011). • In the case of side-effects this problem is even more striking, because the contributing factors necessary to trigger adverse drug reactions are supposed to be rare and may contribute to the side effect through a totally unexpected pathway (Hauben and Aronson 2006; Smith et al. 2012). • Thus not only RCTs might be unable to detect side effects because no or too few subjects in the sample have the necessary characteristics, but also because they are too simplistic tools for that purpose.
  • 61. Impartiality (I) • The issue of impartiality assumes in the case of benefit vs. risk assessment opposite characteristics. • Since benefit is intended and desired, but may be counterfeited for obvious commercial interests, the most natural way to deal with bogus products is to put the claim of efficacy to the test of strict trials. • For the risk, the situation is quite different. • By regimenting benefit and risk assessment within the same standards, we forget that in the case of risk, the question we want to answer is not whether the drug really causes it, but whether we can safely exclude that it does.
  • 62. Impartiality (II) • The higher the expected harm with respect to the expected benefit, the less we need to be sure about the causal link in order to decide to adopt risk preventive measures (precautionary principle). • The causal link need be no more probable than it is necessary to compensate the risk-benefit unbalance (Osimani, 2013 a,b). • Instead on the side of the industry, there is all interest in discounting the drug as a possible causal contributor to the side effects, thus the stricter are the standards for causal assessment, the easier it is for them to provide whitewashed drug profiles.
  • 63. Impartiality (III) • Teira (2011) conceptualizes impartiality as a way to deal with uncertainty such that it cannot be exploited by some party’s private interest. • Waiting for an RCT to definitively prove that an observed risk is really associated with a suspected drug exactly represents the case in which the uncertainty about the causal association is exploited by the industry’s private interest. • I think this is what Papanikolau et al. (2006) have in mind when they say that “it may be unfair to invoke bias and confounding to discredit observational studies as a source of evidence on harms”.
  • 64. Case Study: hypothesis of causal connection between paracetamol and asthma Asthma increase in the United States and in Western countries in the last 3 decades: up to a 75% increase among adults and to a 160% among children in the same period. (Burr et al., 1989; Eneli et al., 2005, Ninan and Russel, 1992; Mannino et al., 1998, 2002, Seaton et al. 1994).
  • 65. Explanatory hypotheses for asthma epidemic 1) increased exposure to outdoor and indoor pollutants; 2) decreased exposure to bacteria and childhood illnesses during infancy (the “hygiene hypothesis”); 3) increased obesity incidence and prevalence; 4) changes in diet and oxidant intake; 5) cytokine imbalance as a reaction to environmental allergens in early childhood leading to lifelong T-helper type 2 (allergic) dominance over T-helper type 1 (nonallergic) reactions, thus increasing the risk for atopic disease Eneli et al., 2005; Seaton et al. 1994, Shaheen et al. 2000.
  • 66. How suspicion fell upon paracetamol • Varner and colleagues (1998) detected a precise correspondence between increase of asthma incidence and increased paracetamol use as a substitute for aspirin (following the recognition of an association between aspirin and Reye’s syndrome). • The trend levelled off in the 1990s, i.e. at a time when paracetamol had already become one of the most widespread analgesics. • Varner and colleagues tentative explanation was however that asthma increase was due to aspirin avoidance, for the reason that aspirin may protect from asthma through inhibition of prostaglandins. • However, this hypothesis was soon discounted on grounds that, if this had been the case, then one should have observed a decrease of asthma incidence when aspirin was first introduced (Shaheen et al. 2000). • Thus the suspicion finally fell upon paracetamol itself and subsequent investigations explicitly aimed to examine the hypothesis of causal connection between paracetamol and asthma.
  • 67. Evidence for causal association between paracetamol and asthma “Many observations suggest that the epidemiologic association between acetaminophen and asthma is causative: 1) consistency of the association across geography, culture and age; 2) strength of the association (comparative studies); 3) the dose-response relationship between paracetamol exposure and asthma; 4) the coincidence of the timing of increasing asthma prevalence and increasing paracetamol use; 5) the relationship between per-capita sales of paracetamol and asthma morbidity across countries; 6) our inability to identify any other abrupt environmental change that could explain this increase in asthma morbidity; 7) plausible mechanism: glutathione depletion in airway mucosa caused by paracetamol”. McBride JT (2011) The Association of Acetaminophen and Asthma Prevalence and Severity, Pediatrics, 128 (6).
  • 68. Consistency of the association across geography, culture and age (I) Source Year of study Study objective Population Results Beasley et al. Cross-cultural study 2008 Examine the risk of asthma rhynoconjunctivitis and eczema in children using paracetamol 122 centers in 54 countries 200,000 children 6-7 yr Dose dependent increase in prevalence and severity of asthma > once per year: OR 1.61 (95% CI 1.46-1.77) ≥ once per month: OR 3.23 (95% CI 2.91-3.60) Association identified at almost all sites regardless of geography, culture, stage of development Beasley et al. Cross-cultural study 2011 Examine the risk of asthma rhynoconjunctivitis and eczema in adolescents using paracetamol 122 centers in 54 countries 320,000 children 13-14 yr old Dose dependent increase in prevalence and severity of asthma > once per year: OR 1.43 (95% CI 1.33-1.53) ≥ once per month: OR 2.51 (95% CI 2.33-2.70) Association identified at almost all sites regardless of geography, culture, stage of development Systematic review and meta-analysis of epidemiol ogic studies Etminan et al. 2009 Quantify the association between acetaminophen use and the risk of asthma in children and adults. Thirteen cross-sectional studies, four cohort studies, and two case- control studies comprising 425,140 subjects Pooled odds ratio (OR) for asthma among subjects using acetaminophen was 1.63 (95% CI, 1.46 to 1.77). The risk of asthma in children among users of acetaminophen in the year prior to asthma diagnosis and within the first year of life was elevated (OR: 1.60 [95% CI, 1.48 to 1.74] and 1.47 [95% CI, 1.36 to 1.56], respectively). Only one study reported the association between high acetaminophen dose and asthma in children (OR, 3.23; 95% CI, 2.9 to 3.6). There was an increase in the risk of asthma and wheezing with prenatal use of acetaminophen (OR: 1.28 [95% CI, 1.16 to 41] and 1.50 [95% CI, 1.10 to 2.05], respectively).
  • 69. Consistency of the association across geography, culture and age (II) Longitudinal birth- cohort study Amberbir et al. 2011 Investigate the independent effects of paracetamol and geohelminth infection on the incidence of wheeze and eczema in a birth cohort. population-based cohort of 1,065 pregnant women from Butajira, Ethiopia, Paracetamol use was significantly associated with a dose-dependent increased risk of incident wheeze (adjusted odds ratio = 1.88 and 95% confidence interval 1.03-3.44 for one to three tablets and 7.25 and 2.02-25.95 for ≥ 4 tablets in the past month at age 1 vs. never), but not eczema. Wickens et al. Birth cohort study 2011 investigate the associations between infant and childhood paracetamol use and atopy and allergic disease at 5-6 years. New Zealand Paracetamol exposure between birth and 15 months in Christchurch (n=505) and between 5 and 6 years for all participants (Christchurch and Wellington) (n=914). Outcome data collected at 6 years for all participants. Logistic regression models were adjusted for potential confounders Paracetamol exposure before the age of 15 months was associated with atopy at 6 years [adjusted odds ratio (OR)=3.61, 95% confidence interval (CI) 1.33-9.77]. Paracetamol exposure between 5 and 6 years showed dose-dependent associations with reported wheeze and current asthma but there was no association with atopy. Compared with use 0-2 times, the adjusted OR (95% CI) were wheeze 1.83 (1.04- 3.23) for use 3-10 times, and 2.30 (1.28-4.16) for use >10 times: current asthma 1.63 (0.92- 2.89) for use 3-10 times and 2.16 (1.19-3.92) for use >10 times: atopy 0.96 (0.59-1.56) for use 3- 10 times, and 1.05 (0.62-1.77) for use >10 times. Cross-sectional analysis McKeever et al. 2005 To investigate the associations between use of pain medication, particularly paracetamol, and asthma, COPD, and FEV1 in adults. Data from the Third National Health and Nutrition Examination Survey (U.S.) Participants aged between 20 and 80 years, with complete data for relevant exposures, outcomes Dose–response association of paracetamol use and asthma (adjusted odds ratio, 1.20; 95% CI, 1.12–1.28; p value for trend 0.001).
  • 70. Shaheen et al. Case-control study 2000 To investigate whether frequent use in humans was associated with asthma. Adults aged 16–49 years registered with 40 general practices in Greenwich, South London. Frequency of use of paracetamol and aspirin was compared in 664 individuals with asthma and in 910 without asthma. After controlling for potential confounding factors OR for asthma, compared with never users, was 1.06 (95% CI 0.77 to 1.45) in infrequent users (<monthly), 1.22 (0.87 to 1.72) in monthly users, 1.79 (1.21 to 2.65) in weekly users, and 2.38 (1.22 to 4.64) in daily users (p (trend) = 0.0002). This association was present in users and nonusers of aspirin. Shaheen et al. Multicentric case- control study 2008 To examine whether or not frequent paracetamol use is associated with adult asthma across Europe. The network compared 521 cases with a diagnosis of asthma and reporting of asthma symptoms with 507 controls with no diagnosis of asthma and no asthmatic symptoms across 12 European centres. Weekly use of paracetamol, compared with less frequent use, was strongly positively associated with asthma after controlling for confounders. OR 2.87 95% CI 1.49-5.37 No association was seen between use of other analgesics and asthma. Consistency of the association across geography, culture and age (III)
  • 71. Evidence for causal association between paracetamol and asthma “Many observations suggest that the epidemiologic association between acetaminophen and asthma is causative: 1) consistency of the association across geography, culture and age; 2) strength of the association (comparative studies); 3) the dose-response relationship between paracetamol exposure and asthma; 4) the coincidence of the timing of increasing asthma prevalence and increasing paracetamol use; 5) our inability to identify any other abrupt environmental change that could explain this increase in asthma morbidity; 6) the relationship between per-capita sales of paracetamol and asthma morbidity cross countries; 7) plausible mechanism: glutathione depletion in airway mucosa caused by paracetamol”. McBride JT (2011) The Association of Acetaminophen and Asthma Prevalence and Severity, Pediatrics, 128 (6).
  • 72. Comparative studies Source Year of study Study objective Population Results Case-control study Shaheen et al. 2000 Determine if frequent paracetamol use is a risk factor for asthma. Adults aged 16-51 yr in South London Cases: n = 720 (51% response rate) Controls: n = 980 (49% response rate). Never users: OR 1.06 (95% CI 0.77-1.45); Monthly users: OR 1.22 (95% CI 0.87-1.72); Weekly users: OR 1.79 (95% CI 1.21-2.65); Daily users: OR 2.38 (95% CI 1.22-4.64); P value for trend = 0.0002 Prospective cohort study Shaheen et al. 2002 Examine the relationship between prenatal paracetamol use and wheezing in offspring at 6 mo. 9400 women Increased risk of wheezing before 6 mo for offspring of frequent paracetamol users over 20-32 wk prenatally: OR 2.34 (95% CI 1.24-4.40). Prospective cohort study Barr et al. Nurses’ Health Study 2004 Examine the relationship between paracetamol use and new onset of asthma 73,321 women (44-69 yr) Increased risk of diagnosis of new-onset asthma with frequency of use Adjusted RR 1.63, 95% CI 1.11-2.39 Dose dependence: p value for trend = 0.006 Randomized double blind trial without placebo Boston University Fever Study 2002 Compare the incidence of adverse reactions among children administered paracetamol or ibuprofen 84,000 febrile children Age ≤ 12 yr Randomly assigned paracetamol or low – dose ibuprofen, or high dose ibuprofen Among 1879 children with pre-existing asthma, outpatient visits for asthma were lower in the ibuprofen arm than the paracetamol arm (RR 0.56 95% CI 0.34- 0.95); + dose-dependence Hospitalizations were nonsignificantly lower (RR 0.63 95% CI 0.25-1.60).
  • 73. Evidence for causal association between paracetamol and asthma “Many observations suggest that the epidemiologic association between acetaminophen and asthma is causative: 1) consistency of the association across geography, culture and age; 2) strength of the association (comparative studies); 3) the dose-response relationship between paracetamol exposure and asthma; 4) the coincidence of the timing of increasing asthma prevalence and increasing paracetamol use; 5) the relationship between per-capita sales of paracetamol and asthma morbidity and across countries; 6) our inability to identify any other abrupt environmental change that could explain this increase in asthma morbidity; 7) plausible mechanism: glutathione depletion in airway mucosa caused by paracetamol”. McBride JT (2011) The Association of Acetaminophen and Asthma Prevalence and Severity, Pediatrics, 128 (6).
  • 74. Ecologic Study Newson et al. 2000 Examine the rate of Asthma and aggregate consumption of acetaminophen in 1994-95. English speaking countries in the ECHRIS study. Prevalence of wheeze increased by 0.52% for 13-14 yr olds; By 0.26% for young adults, For each gram increase in per capita paracetamol sales. Prevalence of childhood wheezing in 36 countries around the world is predicted by each country’s per-capita sales of paracetamol. Relationship between per-capita sales of paracetamol and asthma morbidity and across countries (5)
  • 75. Acetaminophen Reduced Gluthatione in the airways Alteration of antigen presentation and recognition Shift from Th1 (non- allergic) to Th2 (allergic) cytokine profile Lower inability to counteract oxidative stress Tissue injury Smooth muscle contraction Bronchia hyper- responsiveness Release of pro- inflammatory mediators (leukotrienes) Impaired β-receptor function Stimulation of additional inflammatory cells Lower ability to scavenge acetaminophen toxic metabolite: N-acetil-p- benzoquinonemin e (NAPQI) Acetaminophen toxic metabolite: N-acetil-p- benzoquinonemin e (NAPQI) Reduced immune response to and prolongation of rhinovirus infection Antipiretic effect Cytokine storm 2Reduced IFN-γ and IL-2 ASTHMA
  • 76. • While all pathways are only indirectly relevant to asthma pathogenesis, their plausibility is strongly supported by experimental data at different levels (in vitro, in vivo, and clinical studies). • For some, this evidence provides some mechanistic rationale, and strengthens the support to the causal hypothesis provided by the evidence obtained at the population level, at the point that no additional randomized studies are needed in order to consider acetaminophen as a causative factor for asthma exacerbation or insurgence. • Others instead hold a conservative view and are concerned by confounding.
  • 77. VS
  • 78. Some authors show some reluctance in accepting such evidence as a sufficient basis for practice change and for establishing a causal relationship between acetaminophen and asthma, on grounds that it does not result from randomized clinical trials (Eneli et al. 2005, Allmers et al. 2009, Johnson and Ownby, 2011; Karimi et al., 2006, Wickens et al. 2011, Chang et al. 2011). Particularly, these authors express the concern that the acetaminophen-asthma relationship may be explained by 1) reverse causation, 2) confounding by indication or 3) preference for acetaminophen rather than ibuprofen in children at risk for asthma
  • 79. Other authors, although less sceptical about the causal relationship, nevertheless equally require or recommend the performance of adequately powered placebo-controlled trials to establish causation (Holgate, 2011; Henderson and Shaheen, 2013).
  • 80. Martinez-Gimeno and García-Marcos 2013,: “apart from tobacco smoke exposure, no other genetic or environmental factors, including genes, allergens, infections and bacterial substances, has shown the stubborn and consistent association with wheezing disorders prevalence as acetaminophen has done” They recommend against a too liberal use of acetaminophen in children, while waiting for regulatory agencies to do their part and reconsider the safety profile of acetaminophen Furthermore they are against the performance of double blind RCTs with placebo: “contrary to common claims, a placebo arm would be impractical and unethical, because it would subject participants to a substandard and unacceptable treatment during a very long time” (p. 114).
  • 81. Beasley et al (2011): “When the study findings are considered together with other available data, there is substantive evidence that acetaminophen use in childhood may be an important risk factor for the development and/or maintenance of asthma, and that its widespread increasing use over the last 30 years may have contributed to the rising prevalence of asthma in different countries worldwide”
  • 82. McBride (2011): “The balance between the likely risks and benefits of acetaminophen has shifted for children with a history or family history of asthma. I can understand how those responsible for regulation or policy statements of professional organizations might be more comfortable waiting for incontrovertible evidence [...] At present, however, I need further studies not to prove that acetaminophen is dangerous but, rather, that it is safe. Until such evidence is forthcoming, I will recommend avoidance of acetaminophen by all children with asthma or those at risk for asthma.”
  • 83. By shifting the burden of proof, McBride assumes that, given the available evidence, the hypothesis of causal connection between acetaminophen and asthma is stronger than that of its absence; or, at least, that given the expected harm and benefit, the probability of causal connection between acetaminophen and asthma is high enough as to shift the balance against its use.
  • 84. The dissent concerning the best course of action among scholars is ultimately caused by differing epistemological views which are left implicit. Who’s right?
  • 85. Reasons for preferring vouchers to clinchers in causal assessment for harms 1. Explicit integration of prior knowledge 2. Categorical vs. probabilistic causal assessment 3. Internal vs. external validity 4. Impartiality
  • 86. Integration of prior knowledge and available evidence • Biological data point to potential inflammatory effects of acetaminophen on the airways through multiple (possibly additive) pathways. • Dismissal of the causal link because of possible confounding factors at the epidemiological level explicitly eludes this evidence. • This is also valid for other supporting evidence such as the dose-response relationship found in many studies, and in general for the higher likelihood of the entire set of data on the hypothesis of causation rather than on its denial; • However low prior: in the acetaminophen case, prior knowledge about the molecule itself would be rather against the hypothesis of harmfulness, in that it has been generally considered an harmless analgesics • this might explain according to Martinez-Gimeno and García-Marcos (2013) the reluctance to accept this causal hypothesis •  this means that instead of explicitly taking prior knowledge into account, • Prior knowledge is allowed to influence the interpretation of observational evidence implicitly: The concern about confounders hides a conservative low prior for harmfulness.
  • 87. Categorical vs. cumulative causal assessment • Detractors of the causal hypothesis seem to feel uncommitted until contrary proven and advocate for the performance of RCTs before taking any action • Supporters feel challenged by the evidence already available and consider what should be thought and done on its basis. • Contrary to what expected, the former attitude is not neutral since its default is that there is no causal association, until proved by RCTs, whereas the available evidence does no longer warrant the categorical denial of this hypothesis.
  • 88. Internal vs. external validity • Premarketing studies (RCTs) show acetaminophen to be relatively harmless • But extreme conservatism (+ ignoring non- experimental evidence, because of lower internal validity) ends up with neglecting data on robustness of association across contexts in observational studies.
  • 89. Impartiality • Dismissal of the causal association between acetaminophen and asthma on grounds that the overwhelming epidemiological evidence may be produced by confounders represents a case where uncertainty about causal connection may be exploited by interested parties (Lowe et al. 2010 and Holgate, 2011 have conflicting interests for instance). • A too rigid attitude towards evidence quality may run against the reasons for which quality standards have been introduced.