2. 11/5/2015
2
The general WoE Question is as big as “Science,”
but the application to Chemical Regulation entails
some special features:
• The regulatory process cannot sustain pure science’s
suspension of judgment until ultimate resolution. A basis
for actions is needed.
Skepticism and consideration of alternatives as part of the scientific method
diversity in interpretations among scientists
• The regulatory process needs “findings,” and judgment is
3
Copyright Gradient 2013
delegated to particular people tasked with representing
the larger body of informed scientific opinion.
So, whose judgments and how they are justified become key
• Older, “rules‐based” frameworks (e.g., EPA 1986)
Presume relevance
Main question: Reliability of observation
But increasing MoA understanding and examples of species‐specificity, dose‐
li i i
How did we get to this juncture?
limitation
• Newer, “judgment‐based” frameworks (e.g., EPA 2005)
Guidance on “factors” or “considerations”
Main question: “Sufficiency” of evidence for conclusions
But how to justify conclusions? Hold to objective standards?
Weed (2005) critique of loose use of “WoE”
• NRC review of EPA Formaldehyde
“Roadmap” stressing systematic processes
Need for “methodology” for WoE judgments
4
Copyright Gradient 2013
Need for methodology for WoE judgments
• NRC review of IRIS Process
3. 11/5/2015
3
New Developments Affecting WoE Applications:
• New regulatory programs requiring assessments
(e.g., REACH, GHS)
• New kinds of toxicity concerns
(e.g., Endocrine Disruption, Mixtures)
Increasing sophistication of mode‐of‐action understanding
(e.g., AOPs, Gene‐Environment, Species‐Specific Responses)
5
Copyright Gradient 2013
• New kinds of Toxicity Testing
(e.g., Gene‐expression microarrays, new in vitro tests, 3R)
• Rise of Systematic Review mandates and methods
What is the "Weight‐of‐Evidence" Problem?
(also known as "Evidence Integration")
As a metaphor
h d As a method:
Use all the data
Systematic evaluation
Aim at objective procedures that lay out the process of
scientific professional judgment
6
Copyright Gradient 2013
Question: In view of incomplete and contradictory
evidence, how compelling is the case for the
existence and potential magnitude of risk?
4. 11/5/2015
4
Questions Requiring a WoE Approach:
• Hazard Characterization
Whether a hazard?
For what endpoints and conditions of exposure?
• Proposed Modes‐of‐Action
In studies with observed effects
Relevance of these to humans
• Alternative Dose‐Response Approaches
• Departure from Defaults
7
Copyright Gradient 2013
…. AND any question where empirical support for interpretation
is at issue. When asking, How to do WoE? distinguish:
1. The general epistemological question
2. The application to a specific kind of question
The Need for a Framework
• Fostering good practice and sound application
• Encouraging consistency from case to case
• Setting out and supporting assumptions and
procedures needed for filling of common data gaps
or inferential processes
• Explaining the method, its rationale, and the basis for
its judgments to the affected public
8
Copyright Gradient 2013
5. 11/5/2015
5
Goal
• Review prominent WoE frameworks
• Evaluate existing frameworks to come to insights g g
about their methods, rationales, utility, and
limitations
• Propose key phases to the WoE process
Describe how different frameworks approach WoE within these
phases
• Discuss key aspects best practices and practices to
9
Copyright Gradient 2013
• Discuss key aspects, best practices, and practices to
avoid in weighing evidence
• document variety of approaches
10
Copyright Gradient 2013
document variety of approaches
• seek commonalities, but examine differences
• look for best practices to emulate
• draw insights into new approaches by examining past attempts
6. 11/5/2015
6
Survey
• NAS “Roadmap” recommendation
50 f k• 50+ frameworks
information in online supplement to paper
“scored” for features in common and different
• White Paper, then Workshop Discussion
• Not reviews or evaluations, but source of
11
Copyright Gradient 2013
insight into how WoE structures try to meet
challenges
• Developed by whom and when
• Stated purpose
Appendix – Framework Summaries
p p
• Main application of its approach to WoE
Particularly for integration and evaluation of data
• Any notable features that distinguish it from other
frameworks.
12
Copyright Gradient 2013
7. 11/5/2015
7
WoE “Frameworks” aimed at Specific Evaluations
• Guidance‐like, procedural, specified operations and
structured evaluations based on stated rulesstructured evaluations based on stated rules
• Aim at capturing principles of valid scientific inference
into rules that apply to the question at hand
• Rules become standards that analysts can be held to
• Aim at objective, operational analysis independent of the judge
• Often with lists of “principles” or “considerations”
13
Copyright Gradient 2013
• Challenge: Automating “judgment”
• Too prescriptive lose credibility, become conventionalized
• Too unstructured lose warrant, question whose judgment?
Framework Phases
• Phase 1 – Define Causal Question and Develop
Criteria for Study SelectionCriteria for Study Selection
• Phase 2 – Develop and Apply Criteria for Review of
Individual Studies
• Phase 3 – Integrate and Evaluate Evidence
• Phase 4 – Draw Conclusions Based on Inferences
14
Copyright Gradient 2013
9. 11/5/2015
9
Phase 1 Best Practices
• Define the causal question or hypothesis upfront.
Particularly, if there are several hypotheses that are in
question, articulation of each is key.question, articulation of each is key.
• Problem formulation can be useful for this step.
• Plan literature search strategies and study
inclusion/exclusion criteria.
• Select and organize data in a way that can facilitate
application to Phase 2.
17
Copyright Gradient 2013
pp
• Do not exclude any studies at this point in the analysis
based on quality or whether the study represents
negative or null findings.
Desiderata for Systematic Evidence Assembly:
• Systematic Approach to Literature Inclusion
• inclusion/exclusion criteria
• not just positive or “featured” outcomes of studies• not just positive or featured outcomes of studies
• Systematic and Consistent Review of Studies
• established procedure, tabulation
• Evaluations of Study Strengths and Weaknesses
• BUT – what to do with “lesser” studies?
it? d i ht? i t t?
18
Copyright Gradient 2013
• omit? down‐weight? interpret?
• study design, power, confounders, potential problems
• An Established Process for Evaluation and
Combination into Inferences
• Rigorous review of studies alone is not enough
10. 11/5/2015
10
Phase 2 Best Practices
• Assess individual study quality
Consider study design, confounders, bias/chance, strengths and
k li bili li bili l dweaknesses, replicability, reliability, relevance, adequacy,
statistical methods, etc.
• Categorize studies based on quality
Do not eliminate any studies from the analysis simply based on
weaknesses unless fatal flaw
• Assess individual study results
Consider strength of association internal consistency biological
19
Copyright Gradient 2013
Consider strength of association, internal consistency, biological
plausibility, temporality, and dose‐response.
Systematic Presentation and Review of
Relevant Data
• Not just positive results from positive studies
Also null results from same and other studies
Selection / Omission criteria explicit
• Consistent evaluation criteria
Design soundness, rigor, statistical power
Reliability (aka “internal validity”)
o According to standards of field
o According to needs of the application
• Relevance (aka “external validity”)
20
Copyright Gradient 2013
o … largely a question of interpretation, so intermediate between Phase 1
and Phase 2
• Other “relevant” data
Historical controls, understanding of endpoints and MoA, basis for
understanding biology, similar agents, etc.
11. 11/5/2015
11
• Phase 1 – Define Causal Question and Develop
Framework Phases
Criteria for Study Selection
• Phase 2 – Develop and Apply Criteria for Review of
Individual Studies
• Phase 3 – Integrate and Evaluate Evidence
• Phase 4 – Draw Conclusions Based on Inferences
21
Copyright Gradient 2013
General Kinds of Evidence
• Observed toxicity process that represents an instance
of a more general one that would operate in parallel
in the target population
• Observed biological perturbation or effect that
represents a candidate element of a possible MoA
that might operate in the target population
• Evidence by correlation of the study outcome with
th t t l ti t i it f i th
22
Copyright Gradient 2013
the target population toxicity of concern in other
cases
• Evidence by analogy with other similar cases
12. 11/5/2015
12
• Multiple observations of the thing of interest itself
lti l id i l i t di E id B d
INTEGRATION:
Two Kinds of Inferences from Multiple Studies
e.g., multiple epidemiologic studies; Evidence‐Based
Medicine on studies of treatment efficacy
Main question is consistency and reliable observation
“Weight” from methodologically and statistically reliable
measurements
• Indirect evidence of related or relevant phenomena in
other systems
23
Copyright Gradient 2013
other systems
e.g., animal bioassays, MoA information
Main question is relevance and how to generalize
Need to integrate across evidence that is relevant in
different ways
“Weight” from support of relevance arguments
• We observe particular instances, but what makes
them relevant is the potential for generalization –
The Span of Generalization
p g
that other settings (including the target population)
might have similar causal processes.
• What is the span of generalization? What are its
limits? Assessing this is part of the WoE.
24
Copyright Gradient 2013
14. 11/5/2015
14
Rules‐Based Systems
• Defined process and criteria; decision trees
• Operational; objective evaluations; available data dictate path
• Data quality screens or ratings
“Wi d ” d i l i h i i• “Wisdom” captured in algorithmic structure; appropriate
interpretation built into rules
• Primary output is a decision ‐‐ characterization of uncertainty is
secondary
BUT –
• “Build‐in wisdom” may be inadequate of faulty
d b h (“ h d ff
27
Copyright Gradient 2013
27
• Key decision‐points may beg the question (“When data are sufficient
to conclude that…”)
• Tends to ossify conventional wisdom; rules become reasons for
interpretation
• Deals poorly with ambiguity, alternative viable interpretations,
uncertainty characterization
Evidence‐Based Toxicology
• Analogy to “Evidence‐Based Medicine” – seek empirically
demonstrable and scientifically rigorous basis for opinions (rather than
authority, precedent, common practice)
Ri d fi i i f f l d d d bl fi di• Rigorous definition of set of relevant data and demonstrable findings
• Eschews assumptions, “defaults,” and plausibilities
• Output is supported declaration of sufficiency of evidence for
dependable conclusion of causality – or if not, declaration as
“undecided”
• Very transparent and objective in methods and justification of findings
BUT
28
Copyright Gradient 2013
28
BUT –
• Requires clear‐cut findings and consensus
• Does not characterize uncertainty or differentiate plausibility among
unproven assertions – so does not support decision‐making under
uncertainty
15. 11/5/2015
15
Expert Judgment Elicitation
• Structured querying of acknowledged experts about their
interpretations and beliefs about the likely truth
• Panel of experts to represent diversity of scientific opinion
l d l d d b f b l f l• May include elicited distributions of beliefs in alternative
interpretations
• Output is not a decision, but a characterization of the distribution of
expert opinion
BUT –
• Choice of experts and framing of queries influence outcome
29
Copyright Gradient 2013
29
p g q
• Records judgments, not their bases, so derivation of conclusions is
obscure – does not question whether/how opinions justified
• Transparent in procedure, but not in findings
• Does distribution of opinion really reflect “scientific uncertainty”?
Mode of Action / Human Relevance Framework
30
Copyright Gradient 2013
Table from Cohen SM et al. 2004. Toxicol. Sci. 78:181.
Mammary Tumors Yes No ?
16. 11/5/2015
16
Bradford Hill Postulates
Sir Austin
Bradford Hill
Strength of Association: An observed risk is less likely to be due to chance,
bias, or other factors if it is large and precise.
Consistency: Are the findings reproducible within and across studies?
1897 - 1991
Specificity: Is one disease specific to a particular agent, or is one agent specific to
a particular disease?
Temporality: Does exposure precede the occurrence of disease?
Exposure-Response: Risks must be compared across exposure groups.
Biological plausibility: Are the available data sufficient to allow a scientifically
defensible determination for causation?
C h
31
Copyright Gradient 2013
Coherence: Does all of the evidence fit together in a consistent manner?
Experiment: Is the association altered by an experiment of preventative action?
Analogy: Is there evidence for a similar effect with a different agent?
Consideration of alternative hypotheses: Are other explanations likely?
A. Bradford Hill (1965) Proc Roy Soc Medicine
58:295.
Bradford Hill Postulates
Sir Austin
Bradford Hill
Strength of Association:
Consistency:
Specificity: 1897 - 1991Specificity:
Temporality:
Exposure-Response:
Biological plausibility:
Coherence:
Experiment:
“EXTENDED” Bradford Hill Criteria
• MoA, Key Event Steps
• Animal relevance via common MoA
• General impact of Animal, MoA
information on Epidemiology
32
Copyright Gradient 2013
Experiment:
Analogy:
Consideration of alternative hypotheses:
A. Bradford Hill (1965) Proc Roy Soc Medicine
58:295.
information on pidemiology
interpretation
17. 11/5/2015
17
Phase 3 Best Practices
• Evaluate what types of data are being considered
and what makes these data evidence.
• Assess data relevant to MoA human relevance• Assess data relevant to MoA, human relevance,
and dose‐response.
• Evaluate negative, null, and positive results.
• Integrate these data across all lines of evidence,
so that interpretation of one will inform
interpretation of another.
33
Copyright Gradient 2013
• Ask, if the proposed causative process were true,
what other observable consequences should it
have, and are these in fact seen?
Phase 3 Best Practices
• Note assumptions, especially when they are ad hoc in that
they are introduced to explain some phenomenon already
seen.
• Evaluate, compare, and contrast alternative explanations of
the same sets of results.
• Present conclusions (in text, tables, and figures) not just as the
result of judgments but with their context of reasons for
coming to them and choosing them over competitors.
• Recognize that applying specific study results to address a
l ti ti i i i
34
Copyright Gradient 2013
more general causation question is an exercise in
generalization.
• Based on results of the WoE evaluation, identify data gaps and
data needs, and propose next steps.
18. 11/5/2015
18
Guidelines for Integrating Evidence
• Evaluate what type of data are being considered and what
makes these data evidence.
• Ask, if the proposed causative process were true, what other , p p p ,
observable consequences should it have, and are these seen?
• Note assumptions, especially when they are "ad hoc.”
• Evaluate alternative explanations of results.
• Present conclusions with their context of reasons for coming to
them and choosing them over competitors presented.
• Recognize that applying study results to address a more general
35
Copyright Gradient 2013
causation question is an exercise in generalization.
Phase 4 Best Practices
• Use the results and inferences presented in the final
step of Phase 3 to clearly communicate the logic andstep of Phase 3 to clearly communicate the logic and
reasoning for drawing conclusions about risk or
causation based on those inferences.
36
Copyright Gradient 2013
20. 11/5/2015
20
Applications of Hypothesis‐Based Weight of
Evidence (HBWoE)
• Chlorpyrifos neurodevelopmental toxicity
Prueitt, RL; Goodman, JE; Bailey, LA; Rhomberg, LR. 2011. Crit. Rev. Toxicol. 42(10):822‐
903.
Goodman, JE; Prueitt RL; Rhomberg, LR. 2012. Dose Response 11(2):207‐219.
• Methanol carcinogenicity
Bailey, LA; Prueitt, RL; Rhomberg, LR. 2012. Regul. Toxicol. Pharmacol. 62:278‐291.
• Dioxins thyroid hormone perturbation
Goodman, JE; Kerper, LE; Petito Boyce, C; Prueitt, RL; Rhomberg, LR. 2010. Regul. Toxicol.
Pharmacol. 58(1):79‐99.
• Formaldehyde as a leukemogen
Rhomberg, LR; Bailey, LA; Goodman, JE; Hamade, AK; Mayfield, DB. 2011. Crit. Rev. Toxicol.
41(7):555‐621.
39
Copyright Gradient 2013
( )
• Naphthalene carcinogenicity
Rhomberg, LR; Bailey, LA; Goodman, JE. 2010. Crit. Rev. Toxicol. 40(8):671‐696.
• Methylmethacrylate nasal toxicity
Pemberton, M; Bailey, EA; Rhomberg, LR. 2013. Regul. Toxicol. Pharmacol. 66(2): 217‐233.
• Toluene Diisocyanate carcinogenicity
Goodman, JE; Prueitt, RL; Rhomberg, LR. 2013. Crit. Rev. Toxicol. 43(5):391‐435.
Articulate an Hypothesis
What is the proposed basis for inferring that a
particular phenomenon seen in studies of a chemical's
effects will also happen in target populations?effects will also happen in target populations?
• May be hypothesized MoA ‐‐ or general (“animals predict
humans”)
• A generalization, not just an extrapolation (should apply to
all cases within its realm)
40
Copyright Gradient 2013
• What manifestations of the hypothesis are expected and not
expected? Check against all actual observed results.
21. 11/5/2015
21
For Observed Outcomes that are Candidates
for “Evidence”
• Why we think they happened where they did.
• Why we think they didn’t happen where they didn’t.
• Why we think the “did‐happen” factors would also apply to the target
population.
Might apply? Probably apply? Known to apply?
• Are there discrepant observations, and if so, how do we account for
them?
• Are our “whys”
Observable underlying causes?
41
Copyright Gradient 2013
Observable underlying causes?
Reasonable guesses based on wider knowledge, other cases?
Ad hoc assumptions without evidence, needed to explain otherwise
puzzling phenomena?
Evaluate How Compelling the Case is for the
Proposed Basis of Human Risk in View of:
• “Predictions" of hypotheses that are confirmed in the
observations
• More weight to "risky" and specific predictions
• Less weight when subsidiary assumptions or explanations are needed
• Both Positive and Negative Predictions!
• Apparent Refutations (counterexamples)
• Failure to repeat result across studies
• Non‐responding sexes or species
42
Copyright Gradient 2013
• Unpredicted but clearly relevant phenomena
• An hypothesis can often be reconciled with apparent
refutations by either modifying it or adding subsidiary
assumptions – but this entails a weight "penalty"
22. 11/5/2015
22
Relative Credence in Competing “Accounts”
• “Account” = an articulated set of proposed
explanations for the set of observations
• Relevant Causation but also chance error confounding factors• Relevant Causation – but also chance, error, confounding factors,
general‐knowledge possibilities, plausible assumptions, assertions
of irrelevance, and “unknown reasons”
Certain Findings Indicate Target‐
Population Risk
• reasoning why
Those Findings Do Not Indicate
Target‐Population Risk
• reasoning why not
43
Copyright Gradient 2013
reasoning why
• how contradictions resolved
• why assumptions reasonable
reasoning why not
• how findings are otherwise explained
• why assumptions reasonable
Can we measure the weights?
Sir Austin Bradford Hill
on the Hill Criteria
“. . . the fundamental question – is there any other way of. . . the fundamental question is there any other way of
explaining the set of facts before us, is there any other
answer equally, or more, likely than cause and effect?”
A. Bradford Hill (1965) Proc Roy Soc Medicine 58:295.
“set of facts” =
44
Copyright Gradient 2013
set of facts =
• all the epi (+ and ‐)
• mode of action
• animal studies
• other potential explanations
23. 11/5/2015
23
Advantages of Hypothesis‐Based WoE
• Shows which endpoints are the most compelling
• Makes reasoning explicit – shows both strengths
and weaknessesand weaknesses
• Allows debate about particular data and
interpretations
• Frames WoE classifications as scientific
statements (falsifiable, points to tests)
45
Copyright Gradient 2013
• Does not blend all endpoints together in an
uninterpretable, generalized “risk” statement
• Informs the QRA process
Thank You for Your Attention
Weighing Evidence and Assessing Uncertainties: Where have we been, Where are we going?
Lorenz R. Rhomberg, PhD, ATS
Gradient, Cambridge, Massachusetts, USA lrhomberg@gradientcorp.com
In this talk, I characterize the challenges in the process of weighing scientific evidence about toxicity, outline the needs of a regulatory process to do so by following a set of established
procedures, review some of the processes that have been used and the questions about their adequacy that have been raised, and compare different strategies for structuring weigh‐of‐
evidence inquiry. Finally, I propose some approaches that may achieve the twin aims of flexibility in the face of diverse scientific evidence and sufficient structure to ensure that
consistency, rigor, and justification of conclusions can be documented.
When bringing to bear scientific evidence to support decisions about potential health effects of chemical exposures in foods in the environment and in the workplace there areWhen bringing to bear scientific evidence to support decisions about potential health effects of chemical exposures in foods, in the environment, and in the workplace, there are
inevitable limits to what the available data can demonstrate directly. Scientific judgments about the existence and nature of causal processes of toxicity need to be made while
contending with all the data gaps, extrapolations, inconsistencies, and shortcomings among the available studies. There is need to characterize not only what conclusions can reasonably
be drawn, but also the degree of confidence in them, noting different interpretations that might also be considered. In pure science, an iterative process of hypothesizing general
explanations and seeking critical tests of them in further experiments is pursued, with continued skepticism toward and testing of tentative conclusions being the "scientific method." In
the regulatory context, decisions to take (or to forgo) actions must be made, and the judgments about whether the interpretation of evidence is sufficiently robust to support such
decisions is delegated to a limited set of assessors who must make judgments and defend their legitimacy to interested stakeholders and the public. To ensure consistency in standards of
evidence to support conclusions, and to communicate the judgment process and its justifications, a variety of risk assessment frameworks – procedures for gathering, interpreting, and
drawing conclusions from available evidence – have been put in place and used by various governmental and international organizations.
In recent years, the sufficiency of some of these evidence‐evaluation frameworks, and their ability to make sound, well justified, and well communicated judgments, has been questioned.
This stems in part from deeper understanding of underlying modes of toxic action (and their diversity and differences among different experimental animal strains and humans, and at
different exposure levels), exposing the limits of earlier assumptions about toxicity processes being parallel in test systems and in humans. In part, it is due to an increasing number of
examples in which existing evaluation frameworks seem to miss important scientific considerations that have been revealed by deeper probing of underlying biology. New kinds of
testing data, in particular, high‐throughput in vitro testing and gene‐expression arrays, have opened new avenues for characterizing toxicity pathways (and not just traditional apical
endpoints) and pose challenges to traditional methods.
Critiques by high‐level review panels of several key regulatory assessments have found insufficient explanation of the basis for weight‐of‐evidence judgments. The advent of evidence‐
based medicine as a means for evaluating clinical efficacy of alternative treatments has provided a model for how a more systematic and rigorous process might provide better and more
objective justifications for judgments. In consequence, a great deal of recent attention has focused on how the weight‐of‐evidence process can and should be reformed, and I review
some of the activity that has been undertaken by regulatory and scientific bodies at the national and international level.
Progress has been made on instituting systematic review processes for identifying relevant studies, objectively evaluating strengths and weaknesses, making inclusion/exclusion decisions,
and tabulating results I review some of these and while noting the benefits also argue that this by itself goes only so far in resolving the challenges since the relevance of studies how
46
Copyright Gradient 2013
and tabulating results. I review some of these and, while noting the benefits, also argue that this by itself goes only so far in resolving the challenges, since the relevance of studies, how
interpretations of them interact, and how they do or do not support overarching hypotheses about the basis for possible toxicity still need to be considered, and a process to do so
systematically is challenging to define.
Insights into the challenges and means to address them can be gained by examining the differing strategies that have been employed in constructing evaluation frameworks. I summarize
results from a recent review and workshop doing this. One strategy, a rules‐based or algorithmic approach, aims to build a decision‐tree process that embodies the interpretive wisdom
of the field, such that each decision can be made objectively, and conclusions are justified by how the decision‐tree process disposes of the data at hand. The advantage is objectivity, but
the shortcoming is that the interpretive wisdom needs to be built into the algorithm, which may be faulty, become out of date, or be unable to accommodate novel kinds of evidence. An
alternative strategy is to be more unstructured but to rely on expert judgment from a set of appropriately chosen scientists who then explain the basis for their judgments. The advantage
is flexibility and, possibly, extra scientific insight, but the shortcoming is that the choice of experts becomes controversial, the justifications are articulated after the fact and are keyed to
judgments already made, and the process can lack transparency. The conclusions are justified by asserting the expertise of the judges. A process analogous to evidence‐based medicine
can be rigorous and transparent, but it does not easily deal with evidence that is not direct observation of the question of interest itself – that is, it emphasizes consistency of repeated
observations, but does not handle inference across datasets very well.
I end by presenting my own approach of Hypothesis‐Based Weight of Evidence, which seeks to gain the advantages of others while avoiding the disadvantages. It stresses constructing
competing sets of tentative explanations for all of the relevant study outcomes, where explanations invoking a common causal toxicity process can be compared for plausibility and
dependence on assumptions to an alternative set of possible explanations that denies the tested agent's toxicity and explains outcomes by alternative means, such as chance,
confounding, and variable operation of non‐agent‐related causes in different test systems.