ARTICLE IN PRESS                                                Available online at                 ...
ARTICLE IN PRESS2                               M.D. Freeman et al. / Journal of Forensic and Legal Medicine xxx (2008) xx...
ARTICLE IN PRESS                             M.D. Freeman et al. / Journal of Forensic and Legal Medicine xxx (2008) xxx–x...
ARTICLE IN PRESS4                               M.D. Freeman et al. / Journal of Forensic and Legal Medicine xxx (2008) xx...
ARTICLE IN PRESS                             M.D. Freeman et al. / Journal of Forensic and Legal Medicine xxx (2008) xxx–x...
ARTICLE IN PRESS6                                   M.D. Freeman et al. / Journal of Forensic and Legal Medicine xxx (2008...
ARTICLE IN PRESS                                      M.D. Freeman et al. / Journal of Forensic and Legal Medicine xxx (20...
ARTICLE IN PRESS8                               M.D. Freeman et al. / Journal of Forensic and Legal Medicine xxx (2008) xx...
ARTICLE IN PRESS                            M.D. Freeman et al. / Journal of Forensic and Legal Medicine xxx (2008) xxx–xx...
ARTICLE IN PRESS10                                M.D. Freeman et al. / Journal of Forensic and Legal Medicine xxx (2008) ...
Upcoming SlideShare
Loading in …5

Forensic Epidemiology A Systematic Approach To Probabilistic Determinations In Disputed Matters


Published on

  • Be the first to comment

  • Be the first to like this

No Downloads
Total Views
On Slideshare
From Embeds
Number of Embeds
Embeds 0
No embeds

No notes for slide

Forensic Epidemiology A Systematic Approach To Probabilistic Determinations In Disputed Matters

  1. 1. ARTICLE IN PRESS Available online at J O U R N A L O F FORENSIC AND LEGAL ME D IC IN E Journal of Forensic and Legal Medicine xxx (2008) xxx–xxx Review Forensic Epidemiology: A systematic approach to probabilistic determinations in disputed matters Michael D. Freeman PhD, MPH, DC (Adjunct Associate Professor of Forensic Medicine and Epidemiology, Clinical Associate Professor) a,b,*, Annette M. Rossignol ScD (Professor) c, Michael L. Hand PhD (Professor) d a Institute of Forensic Medicine, Faculty of Health Sciences, University of Aarhus, 205 Liberty Street, Suite B, Salem, OR 97301, USA b Department of Public Health and Preventive Medicine, Oregon Health and Science University School of Medicine, USA c Department of Public Health, Oregon State University, USA d Atkinson Graduate School of Management, Willamette University, USA Received 18 March 2007; received in revised form 15 October 2007; accepted 13 December 2007Abstract Forensic medicine testimony often relies upon terms of probability to enhance the strength of the testimony. Such terms must have ademonstrably reliable and accurate basis; otherwise their use is speculative, unjustified, and potentially harmful. Forensic Epidemiologyis introduced as a framework from which probabilistic testimony can be assessed in settings in which it is either proffered or encountered.In this paper, common forensic uses of probability are reviewed, appropriate methods for presenting such testimony are proposed, andinappropriate uses of probability and epidemiologic concepts and data, as well as a logical fallacies commonly observed in forensic set-tings are presented. A previously unpublished logical fallacy, the ‘‘Prior Odds” Fallacy, is also introduced.Ó 2008 Elsevier Ltd and FFLM. All rights reserved.Keywords: Probability; Epidemiology; Forensic; Prior Odds Fallacy; Sensitivity; Specificity; Positive Predictive Value1. Introduction 73,000,000. The miniscule probability that the deaths were due to natural causes was used by the prosecution as evi- In 1999, British solicitor Sally Clark was convicted in a dence that the deaths were homicidal. Meadow was aUnited Kingdom court of murdering two of her children. well-known and often-used prosecution witness in similarBoth of the infants died within weeks of birth under cir- proceedings, having been the first to promulgate the con-cumstances originally diagnosed by some experts as sudden cept of Munchausen Syndrome by Proxy (MSbP) in whichinfant death syndrome (SIDS) or cot death.1 An important a parent injures or sickens a child as a means of procuringwitness for the prosecution was the prominent pediatrician medical attention.2 Meadow’s Law, a heuristic attributedSir Roy Meadow, who testified that the probability of two to Meadow that pertains to multiple cot deaths in families,cot deaths in one family was exceedingly remote; about 1 in states that unless otherwise proven, one death is tragic, two is suspicious, and three is murder.3 * In the Clark case, the estimate of 1 in 73,000,000 was Corresponding author. Address: Institute of Forensic Medicine, derived from squaring the observed risk of a single cotFaculty of Health Sciences, University of Aarhus, 205 Liberty Street,Suite B, Salem, OR 97301, USA. Tel.: +1 503 586 0127; fax: +1 503 586 death in an affluent non-smoking family; estimated at 10192. in 8500. Meadow’s testimony created a furor among statis- E-mail address: (M.D. Freeman). ticians, with the president of the Royal Statistical Society1752-928X/$ - see front matter Ó 2008 Elsevier Ltd and FFLM. All rights reserved.doi:10.1016/j.jflm.2007.12.009 Please cite this article in press as: Freeman MD et al. Forensic Epidemiology: A systematic approach to ... J Forensic Legal Med (2008), doi:10.1016/j.jflm.2007.12.009
  2. 2. ARTICLE IN PRESS2 M.D. Freeman et al. / Journal of Forensic and Legal Medicine xxx (2008) xxx–xxxwriting an open letter of complaint to the Lord Chancel- appropriate expert to address a claim that the patientlor,4 and the British Medical Journal publishing an edito- would have died within 5 years, on a more likely thanrial concerning the error of allowing non-qualified experts not basis?to testify regarding unsound statistically based arguments.5 Any expert opinion that addresses the probability, risk,The primary complaint with Meadow’s testimony was that incidence, or prevalence of an event occurring or not occur-from a statistical perspective, he treated the specific risk to ring in an individual or a population is an opinion thatthe Clark family for cot death like they were a randomly must have a foundation in valid epidemiologic conceptsselected family with no predilection for the event in ques- and data. Epidemiology is most simply defined as the scien-tion. In other words, Meadow considered the one family tific study or analysis of populations having similar diseasein 8500 with a cot death to simply be unlucky, with no bio- or injury characteristics. The proper application of epide-logical or environmental diathesis for the condition, rather miologic concepts and data to forensic issues is the practicelike the probability that one will roll a six with a die. Thus, of Forensic Epidemiology. The term was first introducedaccording to Meadow, the second death was no more likely by Loue in 19997 and later adopted by the US Centersthan the first, just as a second six is no more likely than the for Disease Control and Prevention (CDC) in 2003 as afirst, and the probability of each can be multiplied to arrive narrowly focused Public Health Law Program moduleat the probability of both. designed to aid with the investigation of acts of bioterror- The most obvious problem with this thinking was that it ism.8 The subject matter covered by the term Forensic Epi-presumed that all possible biologic and environmental risk demiology has since been expanded to cover the multitudevariables for cot death have been investigated and are of areas in which epidemiologic terms, concepts, and dataknown, and that none were present in the Clark family; may be applied in a forensic venue.9an over simplification to the point of deception. Theapproach could be compared to the superficially convinc- 2. Forensic Epidemiologying claim that an injury by lightning strike is a randomevent, and thus a second such injury would be no more The practical application of Forensic Epidemiologylikely than the first. In reality, an individual who works (FE) concerns both the recognition of the improper useon a golf course in a geographic area with frequent thun- of epidemiologic concepts and data as well as the use ofderstorms is much more prone to a lightning strike injury such testimony to help prove the assertion of one side orthan is an office worker who resides in an area where such another. For this reason, this paper is organized in threestorms are rare. sections; the first will help the reader identify the most com- The guilty verdict was appealed, based in part on the mon scenarios in which experts (both epidemiologic andproblems with Meadow’s statistical claims. The conviction non-epidemiologic) use terms and concepts of epidemiol-eventually was overturned when it was found that a wit- ogy in forensic venues. The next section describes appropri-ness had failed to reveal evidence that one of the chil- ate uses of epidemiologic concepts and data in forensicdren’s deaths may have been associated with a venues. The final part of the paper is devoted to commonStaphylococcus aureus infection, among other problems.6 fallacies associated with epidemiologic and probabilisticThe public attention brought to the Clark case also testimony in forensic venues. Because population-basedfocused attention on the lack of validated criteria neces- inferences are often-used to support variety of expert opin-sary for a diagnosis of MSbP, with the result that several ions, the tenets of FE, as presented in this paper, are direc-murder convictions that had resulted from Meadow’s tes- ted at all experts who give opinions regarding medicolegaltimony were reviewed. issues in forensic venues. The tragedy of the Clark case raises the question of howsuch a chain of events could have occurred; how could tes- 2.1. Common testimony types involving FE conceptstimony that is fundamentally flawed be used to imprison aninnocent person? An issue of even greater concern is how Probability serves as the basis for many decisions peopleoften similar superficially convincing testimony is used make on a daily basis. When one purchases a new DVDeffectively in criminal and civil proceedings, resulting in player and opts not to also purchase an extended warranty,an unjust verdict. What is apparent from the many different one is evaluating, consciously or subconsciously, a series ofexperts from a variety of disciplines who gave opinions probabilities. Many factors – how much the unit costs,regarding the faulty testimony in the Clark case, is that one’s prior experience with DVD player failure, and hownot only is there uncertainty regarding what constitutes long one typically keeps electronics before replacing themvalid testimony on some issues, but that there is also a with an updated model – affect the decision that it is lessdegree of uncertainty as to who should be setting the stan- than likely that the expenditure on the warranty is justified.dards for such testimony. As an example, while it is not dif- If one is given some additional knowledge, for example,ficult to determine that the relevant expert to testify in a that one out of every three DVD players will require amedical malpractice claim against an oncologist whose costly repair within a year after the manufacturer’s war-patient died from hypovolemic shock while receiving autol- ranty expires, then one can assess the risk one is takingogous stem cell therapy is another oncologist, who is the in not purchasing the warranty. Please cite this article in press as: Freeman MD et al. Forensic Epidemiology: A systematic approach to ... J Forensic Legal Med (2008), doi:10.1016/j.jflm.2007.12.009
  3. 3. ARTICLE IN PRESS M.D. Freeman et al. / Journal of Forensic and Legal Medicine xxx (2008) xxx–xxx 3 In a similar fashion, data and terms of probability may Risk of an occurrence or condition. Typically given as abe used to sway judge and jury fact finders in a forensic set- proportion or percentage, this is the most commonlyting, because assigning a weight to an opinion is a common used and abused epidemiologic concept in forensic testi-method of strengthening testimony. These opinions affect mony. Risk may be expressed in absolute terms, e.g. thehow a fact finder perceives issues such as causation, negli- risk of dying in a motor vehicle crash in a given year isgence, and injury severity and prognosis that dictate trial approximately 1 in 6500,13 or as a relative risk (typicallyoutcomes. presented as a ratio, but also as a difference), such as the lifetime risk of dying in a car crash is more than 23,0002.1.1. A ‘‘reasonable probability” times greater than dying from a snake bite.3 Misunder- The use of probabilistic language is inescapable in the standing is rife in such claims, however. For example,forensic setting, since the standard for admitting expert while it is reasonable to conclude that one is significantlytestimony is that it is rendered as being ‘‘more probable less likely to die from a snake bite than in a traffic colli-or likely than not” or as a ‘‘reasonable probability” or sion, this does not mean that handling a venomous‘‘reasonable medical probability”; all relatively inter- snake is safer than driving a car. The average person’schangeable terms.10 In some jurisdictions, this standard exposure to a venomous snake, in terms of duration,for expert opinion is assigned a value that must be may be more than 23,000 times less than their exposureexceeded before the testimony is admissible; the expert to a motor vehicle; thus the incidence of snake bite deathmust be ‘‘more than 50% certain” that the opinion is cor- may be significantly higher per unit of time of exposurerect. Using probabilistic language for such testimony is than that of motor vehicle death per exposure for thesomewhat of a mischaracterization of an internal process same unit of time.of the expert, who has opined that he is more certain thannot that his opinion is accurate or true, regardless of the Opinions involving risk often rely upon probabilisticmethods used to arrive at the opinion. There is no way language, and this in turn may lead to a lack of specificity.objectively to weigh all of the processes that make up For example, it is reasonable to opine that not wearing asuch a standard, because experts potentially are influenced seatbelt increases the risk of ejection in the event of a roll-by many different factors that may cause them to favor a over crash, an important determination in some forensicparticular opinion. One example of an exception, in which venues, as it may indicate contributory negligence of thethe decision processes of the expert can be externally scru- occupant to his or her own injuries for failure to wear atinized, is when a data set is described and an opinion is seatbelt. On the other hand, if the occupant was not wear-rendered that a particular outcome lies inside or outside ing a seatbelt, and was not ejected but still fatally injured,of an error range or confidence interval bracketing the the presence or absence of a seatbelt is a significantly smal-average of the data set. In such an instance, an opinion ler factor for injury frequency and severity, particularly ifthat it is a reasonable probability that an outcome would there is a great deal of vehicle roof crush that may havenot take place, for instance, the failure of a medical resulted in severe head and neck injury to a properly posi-device, could be reached based upon the application of tioned and restrained occupant.14 Quantification of the dif-valid and relevant epidemiologic/statistical tenets to the ference in risk of injury between the two scenarios would berelevant data. important in helping a fact finder (judge or jury) determine Probabilistic opinions may be expressed in terms of the: whether the lack of a seatbelt was a significant factor in the case in question. Incidence of an occurrence or condition. This is The following are a sample of common forensic opin- expressed as a rate, with a number of affected persons ions that state or imply probability (NB – the validity of per some denominator. For example, the rate of traffic the opinions is not addressed): crash fatalities in the United States in 2004 was 1.45 per 100,000,000 vehicle miles traveled, 14.59 per It is more likely that the patient will need future surgery as 100,000 persons, 18 per 100,000 registered vehicles, a result of the injury and 21.54 per 100,000 licensed drivers.11 Although – This is a prediction of an event that has not yet the numerator and denominator are different for each occurred. It implies that the future incidence of sur- estimate they all represent the same total number of gery for those who have the injury is higher than deaths for 2004. for those who do not. Such claims do not necessarily Prevalence of an occurrence or condition. Prevalence is have to be based upon published epidemiologic data, essentially a cross-section of a population at a given because they can be a statement of clinical experience, point in time. It is expressed as a proportion or percent- based on one or both of two observations: that age, such as the prevalence of cancer in the United King- patients with the injury in question go on to have dom is approximately 2%, meaning that one out of every the surgery more often than patients who do not, or 50 live persons has a diagnosis of cancer in the UK.12 that an disproportionately large percentage of Prevalence can be estimated from the incidence and sur- patients who have the surgery have a history of the vival rate of a condition. injury. Similar claims, of what has been observed Please cite this article in press as: Freeman MD et al. Forensic Epidemiology: A systematic approach to ... J Forensic Legal Med (2008), doi:10.1016/j.jflm.2007.12.009
  4. 4. ARTICLE IN PRESS4 M.D. Freeman et al. / Journal of Forensic and Legal Medicine xxx (2008) xxx–xxx and thus is possible or plausible, are the converse of tims of assault is significantly higher.15 As a hypo- statements of what is impossible, with regard to the thetical example, it might be said that only 1% of level of substantiating data needed for a valid conclu- infant fatalities that result from an unintentional sion. As an example, if one wanted to determine if injury result in retinal hemorrhage, whereas 75% of any red headed subjects were included in a group of confirmed cases of shaken baby syndrome have the 100, a sample size of one could establish the fact, if same finding. Thus, a finding of retinal hemorrhage a read head was included in the sample. Conversely, is at least suggestive of homicide, absent any other if one wanted to establish that there were no red evidence. What is less clear is how such evidence heads in the group one would have to examine the would be presented when there is a smaller difference hair color of every group member. The level of proof between the two prevalence estimates, e.g. 30% versus required to validate a claim of possibility or plausibil- 50%, and at what point the difference becomes insig- ity is significantly less than what is required to estab- nificant from a evidentiary perspective. This is further lish impossibility or implausibility. discussed in Section 2.2.3. If the occupant had worn his seatbelt the injury would not have occurred – The statement implies that the risk of injury for sim- 2.2. Principles of applied Forensic Epidemiology ilar crashes with similar occupants who are restrained is 0. Unlike the previous opinion, such a statement This section of the paper focuses on how FE is used to implies a basis in data gathered from large samples formulate or substantiate an opinion, and the principles of crashes and occupants that are similar to the case governing such an application. FE is of little use in describ- in question, as it implies impossibility of an outcome ing something that has already occurred and been not just within the realm of the expert’s experience observed; this is the job of the clinician. However, when but for all restrained occupants exposed to the same there are questions regarding causation of injury, or multi- type of collision. ple potential causes, or unknown outcomes, the probability that one cause played a greater role than another must be The disc herniation was not caused by the fall because the weighed, and this often requires the interpretation of data patient did not have immediate acute pain, something that derived from epidemiologic study. An example would be would have been expected with a traumatic disc herniation a crash-related head injury associated with a multiple – This is a statement of prevalence for a certain condi- impact collision scenario, including both frontal and near tion at a certain point in time; it refers to the status of side impacts, with the forensic question of which impact all patients with a traumatic disc herniation shortly caused the injury. An FE approach would consist of eval- after the trauma has occurred. The claim forces an uating the probability of a head injury for a near side inference that 100% of such patients will have pain impact in which the occupant’s head is highly likely to sus- immediately following injury. It would be unusual, tain a high acceleration contact with an unyielding struc- if not unheard of, to find a population sample that ture such as the B-pillar, in comparison with a frontal would allow for such a definitive and broadly sweep- crash scenario in which peak head acceleration is typically ing inference. In some cases, however, there may be a lower. The basis for the opinion would have to come from physiologic reason for such a statement, as for exam- analysis of real world data, as illustrated in the example in ple, when the presence or absence of evidence of hem- Fig. 1. The analysis may already exist in the literature or it orrhage is used to determine whether injury may have may need to be conducted de novo for the purposes of the occurred pre or post-mortem. forensic investigation. The probabilistic data can not Retinal hemorrhage in an infant is reliable indicator of 100 shaken baby syndrome 83.9 Risk of MAIS 3 + Injuries Percentage Risk – This claim sounds like an estimation of point preva- 80 lence, but in fact it is a statement of prevalence ratio 60 47.8 38.9 (also known as odds) as it implies a comparison of 40 19.9 the finding of retinal hemorrhage in violent assault 20 versus some other trauma. Because many shaken 0 baby syndrome homicide prosecutions are defended Near Side Far Side Front al Rear-end with the assertion that the injuries resulted from a fall Crash mode with a Delta-V of 30 mph or some other precipitating unintentional trauma, the Fig. 1. Adapted from [Augenstein J, Perdeck E, Bowen J, Stratton J, claim implies that the incidence of retinal hemorrhage Singer M, Horton T, Rao A. Injuries in near-side collisions. In: in infants that have sustained an unintentionally self- Proceedings of the 43rd annual conference of the association for inflicted injury is very small – at or near zero, and advancement of automotive medicine; 1999. p. 139–58]. (MAIS 3+ refers that the incidence of retinal hemorrhage in infant vic- to injuries rated as serious or greater on the Abbreviated Injury Scale.) Please cite this article in press as: Freeman MD et al. Forensic Epidemiology: A systematic approach to ... J Forensic Legal Med (2008), doi:10.1016/j.jflm.2007.12.009
  5. 5. ARTICLE IN PRESS M.D. Freeman et al. / Journal of Forensic and Legal Medicine xxx (2008) xxx–xxx 5conflict with the real evidence; for example, if the medical the outcome cannot postdate the exposure by a time per-evidence clearly showed an injury to the occupant’s face iod that is clinically considered to be too long or toothen this would be given more weight that the probabilistic short to relate the two. This determination is highlyevidence. The FE conclusion should support, or be sup- dependent upon the specifics of any case. For example,ported by, the evidence in the case. benzene exposure to the skin will cause symptoms of Another application of FE is in the ‘‘what if” scenario in irritation that are apparent within 1 day, however,which an undisputed outcome is compared to a theoretical changes to the blood system may not be apparent foroutcome if a predicate action or event had been different. months. A crash-related injury to the spine may not beFor example, in a case of alleged medical negligence, in apparent for a day or two or even a week, but will notwhich an encapsulated ovarian granuloma tumor has been be completely latent for 2 months prior to causing symp-incompletely removed, the effect might be that the tumor is toms. On the other hand, an injury that causes acuteadvanced from a FIGO Stage I (contained within the symptoms to the spine may mask or overlap with otherovary) to a Stage II (spread to the pelvis). With adequate symptoms resulting from, for example, a concomitantdata, and based on the alternate ‘‘what if” scenario of shoulder injury. Such an injury may not be apparentthe surgeon performing the procedure completely, the 5- for some time following the original traumatic episode.year survival probability for the actual stage of the tumor The determination of etiology is typically made based(II) can be compared to that of the stage of the tumor on clinical judgment on a case-by-case basis, rather thanhad the alleged negligence not taken place (I).16 from clearly delineated guidelines or principles. FE also has demonstrated utility in the criminal prose- 3. There must not be any likely alternative explanations forcution of alcohol-associated vehicular homicides in which the symptoms. The term ‘‘likely” is of critical impor-some or all occupants were ejected, and there was dispute tance, as, for example, it is not sufficient to simply pointas to who was driving; the decedent or an ejected, as well out that a patient with back pain following trauma isas intoxicated, survivor.17 This application of FE, called obese, that obesity is related to back pain, and thus itinjury pattern analysis, evaluates injury patterns associated is more likely that the obesity rather than the traumawith various occupant positions, and relies upon probabi- caused the back pain. For an alternative etiologic expla-listic weighting of observed injury distribution and nature nation to be considered more likely than an allegedversus expected anatomic distribution and pattern of injury exposure it must be both biologically plausible and havebased upon observational epidemiologic study of similar a stronger temporal relationship to symptom onset thantypes of collisions.18 the alleged exposure. If plausibility is present and tem- porality is relatively comparable, then two exposures2.2.1. Causation can be compared by examining the dose–response of Standards for epidemiologic determinations of cause each exposure. This term typically refers to the magni-and effect were first laid out in a systematic fashion by Hill tude and intensity of each exposure, but for the purposesin 1969.19 Hill outlined nine criteria by which determina- of FE may also refer to outcome risk. An example of ations of causation could be made when there is substantial comparison of dose–response is seen in the above exam-epidemiologic evidence linking a disease or injury with an ple comparing the probability of head injury for a nearexposure, e.g. smoking and lung cancer. The criteria have side impact collision versus a frontal impact. Both aresince been modified and distilled by others20–23 but they biologically plausible mechanisms of head injury andall comprise three basic elements: both occurred at the same time; however, the near side impact scenario has an established higher head injury1. There must be a biologically plausible link between the risk. exposure and the outcome. Traumatic loading and bony fracture would be a straightforward example of a plau- It is common practice for clinicians, rather than epidem- sible link. An example of an implausible link would be iologists, to make determinations of causation in individual trauma and leukemia. Plausibility is a very low thresh- patients. A clinician’s causal determination incorporates old that can be overcome with relatively weak evidence, the patient’s history and the results of examinations and such as from small observational studies (case studies or tests with the clinician’s experience and training to arrive case series with small numbers of subjects), or from the at a conclusion regarding causality. Such determinations, results of well-designed experiments with many subjects. however, may not violate any of the three basic elements Analogy also is a valid method of establishing plausibil- of causation. ity; if forceful loading from one type of trauma can cause an injury, than forceful loading from another type 2.2.2. Strength of evidence of trauma may be a plausible cause of the same kind of In 1993, the United States Supreme Court issued an injury. opinion in a case called Daubert v. Merrell Dow Pharma-2. There must be a temporal relationship between the ceuticals Inc. This case set new standards for evidentiary exposure and the outcome. Quite obviously, the out- hearings in the United States, in which the judge acts as come cannot pre-exist the exposure. Less obviously, a gatekeeper for proposed scientific testimony.24 The case Please cite this article in press as: Freeman MD et al. Forensic Epidemiology: A systematic approach to ... J Forensic Legal Med (2008), doi:10.1016/j.jflm.2007.12.009
  6. 6. ARTICLE IN PRESS6 M.D. Freeman et al. / Journal of Forensic and Legal Medicine xxx (2008) xxx–xxxconcerned the alleged teratogenic effects of the drug Ben- rality and likely alternative explanations are primarily clin-dectin, used primarily for pregnancy-associated morning ical determinations and thus largely unaffected by Daubert.sickness. The plaintiff in the case had brought forth evi-dence from a variety of experts who cited in vitro, animal, 2.2.3. Sensitivity, Specificity, and Positive Predictive Valueand chemical studies as a basis for their collective opinion These are fundamental epidemiologic concepts that arethat Bendectin caused the birth defects that were the sub- critical to appropriate weighting of testing results. Theseject of the lawsuit. In response, the defense produced an concepts are equally important for understanding the pre-epidemiologist expert who presented an analysis of epide- cision or reliability of a particular opinion based upon themiologic (observational) studies of women who had used interpretation of a fact in evidence using an established setthe drug, and opined that there was no relationship of criteria. For any test or criterion there are at least fourbetween the use of Bendectin and birth defects. A lower possible results: true positive (TP), in which the test cor-court had ruled that the experimental evidence presented rectly identifies the guilty (or liable) party, true negativeby the plaintiff was insufficient to establish causation in (TN), in which the test correctly identifies the innocentlight of the epidemiologic evidence of the defendant. When party, and false positive (FP) and false negative (FN), inthe case reached the Supreme Court the Justices ruled in which the test incorrectly identifies the innocent as guiltyfavor of the defendant, affirming the ruling of the lower or the guilty as innocent, respectively. In a criminal forensiccourts and establishing a new set of criteria for the admis- setting, the sensitivity of a test indicates the percentage ofsibility of expert scientific testimony. The Daubert decision guilty defendants the test correctly identifies as guilty,helped to highlight the use and misuse of forensic scientific and is calculated by dividing the true positives (the cor-evidence to establish or question causation. In this deci- rectly identified guilty defendants) by all of the guiltysion, a causal relationship suggested or refuted by an ani- defendants, included those incorrectly identified as inno-mal or cadaveric study is insufficient proof for cent (TP + FN). The specificity of the test indicates the per-establishing the etiology of an injury or disease when there centage of innocent defendants that the test will correctlyis contradictory observational evidence. The latter includes identify, and is calculated by dividing all of the correctlyclinical determinations of causation that do not violate the identified innocent defendants (the true negatives) by allthree basic elements of causation noted above. Fig. 2 illus- of the innocent defendants (TN + FP). The most impor-trates the hierarchy of evidence strength in terms of its util- tant parameter of a test or criterion that may be used inity in establishing causality. The Daubert decision solely a forensic venue is its Positive Predictive Value (also calledaddresses evidence that is intended to support or refute Predictive Value Positive), as this value indicates how oftenthe first element of causation, biologic plausibility. Tempo- the test is correct when it indicates guilt. As such, PPV measures the potential for harm when a particular test is used as an isolated index of guilt, as it also demonstrates the proportion of innocent defendants incorrectly identified as guilty. Table 1 is matrix that illustrates these measures. Observational Studies With For purposes of illustration, an example of the utility of Control Groups Positive Predictive Value can be made with the claim made Epidemiologic High earlier in this paper, that retinal hemorrhage (RH) is a reli- studies Uncontrolled Case Series able indicator of shaken baby syndrome (SBS). For the fol- lowing example it is assumed that the claim that RH is Case Studies ‘‘reliable” equates to a sensitivity of 90% and a specificity of 75%; that is, RH is present in 90 out of 100 cases of Strength of Evidence SBS, and 75% cases of non-SBS death will not have RH. Clinical Determinations of Causation If one were to present these statistics as support for the opinion that a pediatric death resulted from SBS because Volunteer studies RH was present it would likely be given a great deal of weight in a forensic setting. If the contrasting defense the- ory is that the fatal injury and RH resulted from a fall Cadaver tests instead of a violent assault, the Positive Predictive Value Experimental studies (PPV) of RH as an indicator of SBS can help a jury deter- Animal studies mine the weight they should give to the evidence. In order to calculate PPV, however, it is necessary to know more Computer models about the data underlying the sensitivity and specificity cal- culations. If, for example, there are 200 cases of SBS deaths Theory Low annually, and this results in 180 (90%) with findings of RH, and there are 1000 cases of non-SBS head injury annually,Fig. 2. Hierarchical depiction of evidence strength for causal determina- with only 250 (25%) with findings of RH, then the PPV istions modeled from the Daubert decision. only 42% (see Table 2), and a determination of SBS based Please cite this article in press as: Freeman MD et al. Forensic Epidemiology: A systematic approach to ... J Forensic Legal Med (2008), doi:10.1016/j.jflm.2007.12.009
  7. 7. ARTICLE IN PRESS M.D. Freeman et al. / Journal of Forensic and Legal Medicine xxx (2008) xxx–xxx 7Table 12 Â 2 matrix illustrating the relationship between the Sensitivity, Specificity, and Positive Predictive Value of a test for guilt Criterion + ÀGuilt Yes True Guilty (TP) False Innocent (FN) All Guilty (TP + FP) Positive Predictive Value TP/(TP + FP) No False Guilty (FN) True Innocent (TN) All Innocent Negative Predictive Value TN/ (FN + TN) (FN + TN) All + tests (TP + FN) All À tests (FP + TN) All cases (TP + FP + FN + TN) Sensitivity TP/ Specificity TN/ (TP + FN) (FP + TN)TP and FP are True Positive and False Positive and TN and FN are True Negative and False Negative.upon the finding of RH alone would be improper (NB: the The following example illustrates the fallacy: a woman isfigures used in the above example are solely for involved in a rear impact collision that results in minimalillustration). damage to her vehicle, and is subsequently diagnosed with a permanent spine injury by her doctor. The insurer2.3. Common forensic fallacies involving epidemiologic defending the case hires a doctor who examines the patientconcepts and opines that the majority of crash injuries recover spon- taneously within a matter of months following a crash with2.3.1. Prior Odds Fallacy minimal damage, and therefore it is highly improbable that The Prior Odds Fallacy, described for the first time in the signs and symptoms of permanent injury are related tothis paper, is related to the Prosecutor’s Fallacy, in which the collision in question. The Prior Odds Fallacy was com-the pre-event or predictive odds associated with a piece mitted in the example when the pre-crash or ‘‘prior odds”of evidence are presented to the jury as a gauge of guilt.25 of contracting a permanent injury (say 1 in 20 or 0.05)An example of the Prosecutor’s Fallacy is as follows: a rare was used suggest a correspondingly high probability (19blood type, present in only 1% of the population and out of 20, or 0.95) that the original doctor’s determinationmatching that of a suspect is found at a crime scene. The of causation was in error. The fallacy occurs due to the factprosecutor uses this evidence to suggest that there is a that there is no relationship between the probability of99% probability that the suspect is guilty. The reason the injury in the general population exposed to minimal dam-inference is a fallacy is that while the matching blood type age crashes (0.05) and the frequency of clinician error inis suggestive of guilt, the 99% figure is unrelated to the cer- determining causality in patients that have been exposedtainty of guilt. For example, in a city of 1,000,000 there to minimal damage crashes (unknown, but unlikely to bewould be 10,000 people with the same blood type as the 0.95). The pre-event probability of an occurrence is not asuspect. Or, the crime may have taken place in an ethni- valid measure of whether the occurrence took place; eithercally homogenous community where 90% of the local den- it did or it did not (0.0 or 1.0). As a simple example, deathsizens have the rare blood type. resulting from plane crashes are exceedingly rare, however, In contrast with the Prosecutor’s Fallacy, the Prior a pathologist’s clinical observations of a decedent followingOdds Fallacy, typically offered in injury litigation settings a plane crash would not be considered to be in erroras evidence against or for causality, is not suggestive of because the death was unlikely to have occurred.either. The Prior Odds Fallacy is seen when the low prob- The fallacy can be further illustrated with the example ofability of an occurrence, e.g. possessing a winning lottery the roll of a six-sided die. The probability that a six will beticket, is used to cast doubt on the accuracy of the observa- rolled is 1 in 6 (17%), and the probability that somethingtion of the occurrence. other than a six will be rolled is 5 in 6 (83%). In this example,Table 22 Â 2 table illustrating the Positive Predictive Value of retinal hemorrhage presence as a gauge of guilt Retinal hemorrhage Present AbsentSBS Yes True Positive (TP) 180 False Negative (FP) 20 All SBS + (TP + FP) Positive Predictive Value TP/ 200 (TP + FN) 42% No False Positive (FN) 250 True Negative (TN) 750 All SBS À (FN + TN) Negative Predictive Value TN/ 1000 (FP + TN) 25% All cases with RH All cases without RH All deaths (TP + FP + FN + TN) 1200 (TP + FN) 430 (FP + TN) 770 Sensitivity TP/(TP + FP) Specificity TN/(FN + TN) 75% Prevalence of SBS in all deaths (TP + FP)/(TP + FP + FN + TN) 90% 17% Please cite this article in press as: Freeman MD et al. Forensic Epidemiology: A systematic approach to ... J Forensic Legal Med (2008), doi:10.1016/j.jflm.2007.12.009
  8. 8. ARTICLE IN PRESS8 M.D. Freeman et al. / Journal of Forensic and Legal Medicine xxx (2008) xxx–xxxthe result of the roll is recorded by a hypothetical machine 5%, 10%, or 40% for similar crashes, diagnostic error isthat has been found to have an error rate of one in 50 obser- independent of pre-event probability of injury.vations, so that the roll is misidentified in 2% of cases. ThePrior Odds Fallacy would occur if a 6 was rolled and subse- 2.3.2. Fallacies contributing to lower prior odds estimatesquently identified as such by the machine (with a 2% error All of the following fallacies are used to establish anrate), but it was asserted that there was an 83% probability erroneous pre-event probability that would then be appliedthat the result was something other than a 6 (83% probabil- to causation via the Prior Odds Fallacy:ity the machine is wrong). The error rate of clinical determinations of causality in (a) Non-representative sample fallacy. Results observedminimal damage crash injuries is not known, but it is not in one side of a population distribution curve cannotlikely to be very large. Such determinations would depend be used to argue thatthe other side does not exist or isupon the observation of the three criteria of causation rare. An example is seen in the human subject crashdescribed earlier in this paper. Conversely, the error rate testing literature in which some authors have con-resulting from the introduction of evidence that invokes cluded that there is a crash speed injury thresholdthe Prior Odds Fallacy, if a judge or jury determination below which injury is unlikely or impossible in theis based upon such evidence, would be calculated as general population26,27 with the intent that thefollows: thresholds be applied in medicolegal settings as a lit- mus test for causation.28 The results of testing of thePrior Odds Fallacy error rate ¼ 1 À Ec ; hardiest members of the population who are notwhere Ec is the actual error rate in clinician observations of injured until they are exposed to high crash forcescausation. Although there are no published data on such (arrow ‘‘A” in Fig. 3) cannot be used to exclude theerrors, for the purposes of this paper the error rate is as- existence of, or in any other way define the distribu-signed a value of 0.05, theoretically taking into account tion of the members of the population who arecases in which false patient or physician attribution of cau- injured when exposed to low crash forces (arrowsation has occurred. Thus, using the values stated above, ‘‘B” in Fig. 3). The use of volunteer crash tests tothe rate error in fact finder determinations when the Prior establish an injury threshold in the general popula-Odds Fallacy is accepted as evidence is 1 À 0.05 = 0.95 or tion has been criticized as unscientific, as the studies95%. underlying the proffered forensic opinion suffer from In criminal cases the only fact finder determination is the (1) inadequate study numbers of subjects, vehicles,guilt or innocence of the accused. This scenario differs from and crash conditions (contributing to random varia-civil litigation, in which the fact finder must determine (a) tion), (2) non-representative study samples (crash testwhether the defendant acted negligently, and if so then volunteers cannot be said to represent the full(b) whether the act of negligence is causally related to the spectrum of the motoring public with regard to injuryalleged injuries, and if so, then (c) the amount of damages susceptibility), and (3) they are conducted under non-to be awarded. The Prior Odds Fallacy is primarily direc- representative conditions (crash tests are designed toted at the causation determination in civil litigation. Fur- minimize participant injury risk).29 The fallacy alsother, of the three causation elements identified earlier in occurs when in vitro, ex vivo, animal model, computerthis paper (biologic plausibility, temporality, and the lack model, and other surrogates are used as a basis forof a likely alternative explanation), the Prior Odds Fallacy establishing or questioning directed mainly at biologic plausibility, relying upon the (b) Appeal to statistical authority. Juries are more likelyimplication that a low prior odds (e.g., only one in 20 will to be convinced of the validity of testimony when itbe injured) is an indicator of implausibility. In fact, a very is supported with a reference to statistics or statisticallow level of scientific or clinical evidence is required toassert a plausible biologic association between a noxious Injury Susceptibilityexposure and an injury outcome, as plausibility is eitherpresent or not, regardless of degree. Thus, assertions oflow frequency of association between an exposure and an Proportion Injuredoutcome are irrelevant to biologic plausibility. The Prior Odds Fallacy also occurs in plaintiff experttestimony that intended to support causality in civil litiga-tion. An example is sometimes seen in crash injury cases, in B Awhich photographs of extensive vehicle damage are used toelicit testimony that the degree and extent of injuryobserved in a patient are consistent with the crash, imply- LOW Crash Force HIGHing a low likelihood of error in the observation of causalassociation between the subject crash and the diagnosed Fig. 3. Theoretical normal distribution of the general population byinjuries. Regardless of whether the frequency of injury is degree of crash severity required to cause injury. Please cite this article in press as: Freeman MD et al. Forensic Epidemiology: A systematic approach to ... J Forensic Legal Med (2008), doi:10.1016/j.jflm.2007.12.009
  9. 9. ARTICLE IN PRESS M.D. Freeman et al. / Journal of Forensic and Legal Medicine xxx (2008) xxx–xxx 9 language, regardless of the appropriateness of the sta- disc herniation has ever been observed in volunteer tistical reference. An extreme example was seen in the crash testing it is impossible to herniate a disc under Sally Clark case in which probability of innocence similar circumstances. was given in very precise terms (1 in 73 million), how- (d) Straw man fallacy. This fallacy stems from the use of ever, non-specific probabilistic language tends to unvalidated constructs or inappropriate proxies for accomplish the same goal. The claim that an opinion causal mechanisms. A good example is seen in the lit- is ‘‘highly likely” or ‘‘highly unlikely” rather than erature regarding whiplash injury; some authors have merely ‘‘likely” or ‘‘unlikely” is an example weighting compared peak accelerations recorded at the head an opinion without precision; unless the terms are during activities such as sneezing30 or skipping rope31 defined and data are presented to substantiate the to the accelerations observed in volunteer crash test- claims, such opinion weighting is speculative and ing. The fallacy occurs when peak head acceleration potentially harmful. The same is true for the use of is used as a proxy for injury risk, so that the improper ‘‘reasonable certainty” versus ‘‘reasonable probabil- conclusion is drawn that skipping rope and crash- ity”; with no substantiating data the former is no related trauma have the same injury potential. more than an attempt to bolster the persuasiveness Because whiplash injury occurrence is associated with of an opinion with misleading language. Caution numerous variables aside from peak acceleration, must be used when assessing for this fallacy; it does such as gender, occupant bracing, vehicle and seat not occur when a clinician makes a claim based on variations, inter alia,28 selecting a variable that is only clinical experience. For example, the claim that. ‘‘it loosely correlated with injury occurrence as an index is highly unlikely the surgery will result in a substan- of the probability of injury presence lies at the heart tial impairment” is a reasonable conclusion for a sur- of this fallacy. geon to draw regarding one of his own patients, when the claim is based upon his experience performing the surgery and observation the results of the surgery. In 2.3.3. Base rate fallacy contrast, it would not be appropriate for a clinician This fallacy has been described by others as applying to to claim that her clinical experience allows her to a variety of scenarios but is frequently overlooked in foren- draw the conclusion that certain injuries are highly sic medicine testimony.32 This fallacy occurs when the base unlikely to require treatment, because many patients rate of a finding or occurrence in a relevant comparison may leave this doctor’s care and seek care elsewhere, population is erroneously overlooked while the prevalence unbeknownst to her. In the two examples above, the of the same finding in the target population is used as evi- former claim is based upon a reasonable sample of dence in favor of one side or another. An example was pre- the population to which the claim is to be extrapo- sented previously in this paper, in the shaken baby lated (patients who have surgery). The latter claim syndrome (SBS) example. While it is important for the fact is less valid because the patients seen by any one cli- finder to be made aware that retinal hemorrhage is present nician may be non-representative of the general in 90% of SBS cases at post-mortem examination, it is injured patient population (i.e. specialist practices), equally important to know the prevalence of the same con- and thus limit the extrapolability of the clinician’s dition in all of the relevant non-violent assault injury mech- experiences. anisms as well, as discussed earlier in this paper in Section(c) Impossibility fallacy. This occurs when an expert 2.2.3. opines that a causal relationship is impossible; the claim implies both 100% clinical observation error 3. Conclusions rate (as a result of the Prior Odds Fallacy) and a nearly census of population-based data from which The 19th century essayist and novelist Charles Dudley to infer the claim of 0 incidence. The fallacy also Warner (1829–1900) is credited with the quote ‘‘Everyone occurs when an expert claims that a causal relation- complains about the weather but no one does anything ship is always present, implying zero uncertainty in about it”. In some ways, the quote is apropos for the wide- the opinion. The easiest way to understand this fal- spread but unsystematic use of probability in forensic med- lacy is to picture a box full of 100 rubber balls that icine, in that everyone uses it but not everyone understands can be either red or black. A statement that there is it. The purpose of this paper, in which the concept and a red ball in the box (red ball possible) could be ver- some of the applications of Forensic Epidemiology have ified with a sample of only one ball, if the ball was been introduced, is to fill a void that presently exists in red. In contrast, a statement that there are no red forensic medicine with the addition of a general heading balls in the box (red ball impossible) would require under which the proper and improper forensic use of prob- an examination of every ball in the box. The claim ability is systematically described. As demonstrated by the may incorporate other fallacies as well, such as tragedy of the Sally Clark case, there is little doubt that the Non-representative sample fallacy, e.g. because no use of probability in forensic medicine is in need ofPlease cite this article in press as: Freeman MD et al. Forensic Epidemiology: A systematic approach to ... J Forensic Legal Med(2008), doi:10.1016/j.jflm.2007.12.009
  10. 10. ARTICLE IN PRESS10 M.D. Freeman et al. / Journal of Forensic and Legal Medicine xxx (2008) xxx–xxxstandardization; there is a high potential for continued 11. [accessed 11-21-2006].harm and injustice if nothing is done in this regard. 12. [accessed 11-21-2006]. Better and more explicit heuristics are needed to 13. [accessed 11-21-2006].describe and implement the concepts introduced in this 14. Herbst B, Forrest S, Orton T, Meyer SE, Sances Jr A, Kumaresan S.paper for the wide variety of circumstances encountered The effect of roof strength on reducing occupant injury in forensic medicine. A few recommendations are as Biomed Sci Instrum 2005;41:97–103.follows: 15. Wyszynski ME. Shaken baby syndrome: identification, intervention, and prevention. Clin Excell Nurse Pract 1999;3(5):262–7. 16. Petignat P, de Weck D, Goffin F, Vlastos G, Obrist R, Luthi JC. Be alert for the language of probability or epidemiology Long-term survival of patients with apparent early-stage (FIGO I–II) in forensic opinions. epithelial ovarian cancer: a population-based study. Gynecol Obstet When epidemiologic data are referenced as a basis for Invest 2006;63(3):132–6 [Epub ahead of print]. an opinion, evaluate the propriety of their use. Are the 17. Freeman MD, Nelson C. Injury pattern analysis as a means of driver identification in a vehicular homicide; a case study. Forensic Examiner sample population and circumstances sufficiently similar 2004;13(1):24–8. to allow for extrapolation to the facts in the present 18. Freeman MD. Injury pattern analysis as a means of driver determi- case? nation in a vehicular homicide investigation. In: Proceedings of 16th When in doubt regarding causal determinations, return Nordic conference on forensic medicine; 2006. p. 38–9. to the three essential elements of causation: biologic 19. Hill AB. The environment and disease: association or causation? Proc R Soc Med 1965;5:295–300. plausibility, temporality, and lack of likely alternative 20. Stephens MD. The diagnosis of adverse medical events associated with explanation. drug treatment. Adv Drug React Ac Poison Rev 1987;6:1–35. When a clinical outcome is known, be aware of the 21. Miller FW, Hess HV, Clauw DJ, Hertzman PA, Pincus T, Silver RM, potential for Prior Odds and other fallacies. et al. Approaches for identifying and defining environmentally If a test or criterion is set as an evidentiary standard, associated rheumatic disorders. Arth Rheum 2000;43(2):243–9. 22. McLean SA, William DA, Clauw DJ. Fibromyalgia after motor determine if the Specificity, Sensitivity, and Positive Pre- vehicle collision: evidence and implications. Traffic Injury Prev dictive Value is known or can be determined for the test 2005;6:97–104. or criterion. Use these tools to help determine the real 23. Evans AS. Causation and disease: a chronological journey. The utility of the test or criterion in a forensic setting. Thomas Parran Lecture. Am J Epidemiol 1978;108(4):249–58. 24. Daubert v. Merrell Dow Pharmaceuticals (92–102), 509 U.S. 579; 1993. 25. Thompson WC, Schumann EL. Interpretation of statistical evidenceReferences in criminal trials; the Prosecutor’s Fallacy and the Defense Attorney’s Fallacy. Law Hum Behav 1987;11(3):167–87. 1. Bacon CJ. The case of Sally Clark. J R Soc Med 2003;96:105. 26. McConnell WE, Howard RP, Poppel JV, et al. Human head and neck 2. Meadow R. Munchausen syndrome by proxy. The hinterland of child kinematic after low velocity rear-end impacts: understanding whip- abuse. Lancet 1977;2(8033):343–5. lash. In: Proceedings of the 39th stapp car crash conference, #952724; 3. Meadow R, editor. ABC of child abuse. London, UK: BMJ Publishing 1995. p. 215–38. Group; 1989. 27. Szabo TJ, Welcher JB, Anderson RD, et al. Human occupant 4. [accessed kinematic response to low speed rear-end impacts. SAE tech paper 02-02-2007]. series 940532; 1994. p. 23–35. 5. Watkins SJ. Conviction by mathematical error? BMJ 2000;320:2–3. 28. Castro WHM, Schilgen M, Meyer S, Weber M, Peuker C, Wortler K. 6. [accessed 02-02- Do ‘‘whiplash injuries” occur in low-speed rear impacts? Eur Spine J 2007]. 1997;6:366–75. 7. Loue S. Forensic Epidemiology: A comprehensive guide for legal and 29. Freeman MD, Croft AC, Rossignol AM, Weaver DS, Reiser M. A epidemiology professionals. Carbondale: Southern Illinois University review and methodologic critique of the literature refuting whiplash Press; 1999. syndrome. Spine 1999;24(1):86–98. 8. [accessed 11- 30. Allen ME, Weir-Jones I, Motiuk DR, et al. Acceleration perturbations 27-2006]. of daily living: a comparison to ‘whiplash’. Spine 1994;19(11):1285–90. 9. Rossignol AM. Principles and practice of epidemiology; an engaged 31. Rosenbluth W, Hicks L. Evaluating low-speed rear-end impact approach. New York, NY: The McGraw-Hill Companies; 2005. p. severity and resultant occupant stress parameters. J Forensic Sci 224–5. 1994;39(6):1393–424.10. Mossman D. Interpreting clinical evidence of malingering: a Bayesian 32. Fantino E. Behavior-analytic approaches to decision making. Behav perspective. J Am Acad Psychiatr Law 2000;28(3):293–302. Process 2004;66(3):279–88. Please cite this article in press as: Freeman MD et al. Forensic Epidemiology: A systematic approach to ... J Forensic Legal Med (2008), doi:10.1016/j.jflm.2007.12.009