Refer to Example 4.10,
Solution
Bayes\' theorem deals with the role of new information in revising probability
estimates. The theorem assumes that the probability of a hypothesis (the posterior probability) is
a function of new evidence (the likelihood) and previous knowledge (prior probability). The
theorem is named after Thomas Bayes (1702–1761), a nonconformist minister who had an
interest in mathematics. The basis of the theorem is contained in as essay published in the
Philosophical Transactions of the Royal Society of London in 1763. Bayes\' theorem is a logical
consequence of the product rule of probability, which is the probability (P) of two events (A and
B) happening— P(A,B)—is equal to the conditional probability of one event occurring given that
the other has already occurred—P(A|B)—multiplied by the probability of the other event
happening—P(B). The derivation of the theorem is as follows: P(A,B) = P(A|B)×P(B) =
P(B|A)×P(A) Thus: P(A|B) = P(B|A)×P(A)/P(B). Bayes\' theorem has been frequently used in
the areas of diagnostic testing and in the determination of genetic predisposition. For example, if
one wants to know the probability that a person with a particular genetic profile (B) will develop
a particular tumour type (A)—that is, P(A|B). Previous knowledge leads to the assumption that
the probability that any individual will develop the specific tumour (P(A)) is 0.1 and the
probability that an individual has the particular genetic profile (P(B)) is 0.2. New evidence
establishes that the probability that an individual with the tumor—P(B|A)—has the genetic
profile of interest is 0.5. Thus: P(A|B) = 0.1×0.5/0.2 = 0.25 The adoption of Bayes\' theorem
has led to the development of Bayesian methods for data analysis. Bayesian methods have been
defined as \"the explicit use of external evidence in the design, monitoring, analysis,
interpretation and reporting\" of studies (Spiegelhalter, 1999). The Bayesian approach to data
analysis allows consideration of all possible sources of evidence in the determination of the
posterior probability of an event. It is argued that this approach has more relevance to decision
making than classical statistical inference, as it focuses on the transformation from initial
knowledge to final opinion rather than on providing the \"correct\" inference. In addition to its
practical use in probability analysis, Bayes\' theorem can be used as a normative model to assess
how well people use empirical information to update the probability that a hypothesis is true. 1.
Conditional Probabilities and Bayes\' Theorem The probability of a hypothesis H conditional on
a given body of data E is the ratio of the unconditional probability of the conjunction of the
hypothesis with the data to the unconditional probability of the data alone. (1.1) Definition. The
probability of H conditional on E is defined as PE(H) = P(H & E)/P(E), provided that both terms
of this ratio exist and P(E) > 0.[1] To illustrate, suppose J. Do.
ICT Role in 21st Century Education & its Challenges.pptx
Refer to Example 4.10, Solution Bayes theo.pdf
1. Refer to Example 4.10,
Solution
Bayes' theorem deals with the role of new information in revising probability
estimates. The theorem assumes that the probability of a hypothesis (the posterior probability) is
a function of new evidence (the likelihood) and previous knowledge (prior probability). The
theorem is named after Thomas Bayes (1702–1761), a nonconformist minister who had an
interest in mathematics. The basis of the theorem is contained in as essay published in the
Philosophical Transactions of the Royal Society of London in 1763. Bayes' theorem is a logical
consequence of the product rule of probability, which is the probability (P) of two events (A and
B) happening— P(A,B)—is equal to the conditional probability of one event occurring given that
the other has already occurred—P(A|B)—multiplied by the probability of the other event
happening—P(B). The derivation of the theorem is as follows: P(A,B) = P(A|B)×P(B) =
P(B|A)×P(A) Thus: P(A|B) = P(B|A)×P(A)/P(B). Bayes' theorem has been frequently used in
the areas of diagnostic testing and in the determination of genetic predisposition. For example, if
one wants to know the probability that a person with a particular genetic profile (B) will develop
a particular tumour type (A)—that is, P(A|B). Previous knowledge leads to the assumption that
the probability that any individual will develop the specific tumour (P(A)) is 0.1 and the
probability that an individual has the particular genetic profile (P(B)) is 0.2. New evidence
establishes that the probability that an individual with the tumor—P(B|A)—has the genetic
profile of interest is 0.5. Thus: P(A|B) = 0.1×0.5/0.2 = 0.25 The adoption of Bayes' theorem
has led to the development of Bayesian methods for data analysis. Bayesian methods have been
defined as "the explicit use of external evidence in the design, monitoring, analysis,
interpretation and reporting" of studies (Spiegelhalter, 1999). The Bayesian approach to data
analysis allows consideration of all possible sources of evidence in the determination of the
posterior probability of an event. It is argued that this approach has more relevance to decision
making than classical statistical inference, as it focuses on the transformation from initial
knowledge to final opinion rather than on providing the "correct" inference. In addition to its
practical use in probability analysis, Bayes' theorem can be used as a normative model to assess
how well people use empirical information to update the probability that a hypothesis is true. 1.
Conditional Probabilities and Bayes' Theorem The probability of a hypothesis H conditional on
a given body of data E is the ratio of the unconditional probability of the conjunction of the
hypothesis with the data to the unconditional probability of the data alone. (1.1) Definition. The
probability of H conditional on E is defined as PE(H) = P(H & E)/P(E), provided that both terms
of this ratio exist and P(E) > 0.[1] To illustrate, suppose J. Doe is a randomly chosen American
2. who was alive on January 1, 2000. According to the United States Center for Disease Control,
roughly 2.4 million of the 275 million Americans alive on that date died during the 2000
calendar year. Among the approximately 16.6 million senior citizens (age 75 or greater) about
1.36 million died. The unconditional probability of the hypothesis that our J. Doe died during
2000, H, is just the population-wide mortality rate P(H) = 2.4M/275M = 0.00873. To find the
probability of J. Doe's death conditional on the information, E, that he or she was a senior
citizen, we divide the probability that he or she was a senior who died, P(H & E) = 1.36M/275M
= 0.00495, by the probability that he or she was a senior citizen, P(E) = 16.6M/275M = 0.06036.
Thus, the probability of J. Doe's death given that he or she was a senior is PE(H) = P(H &
E)/P(E) = 0.00495/0.06036 = 0.082. Notice how the size of the total population factors out of
this equation, so that PE(H) is just the proportion of seniors who died. One should contrast this
quantity, which gives the mortality rate among senior citizens, with the "inverse" probability of
E conditional on H, PH(E) = P(H & E)/P(H) = 0.00495/0.00873 = 0.57, which is the proportion
of deaths in the total population that occurred among seniors. Here are some straightforward
consequences of (1.1): Probability. PE is a probability function.[2] Logical Consequence. If E
entails H, then PE(H) = 1. Preservation of Certainties. If P(H) = 1, then PE(H) = 1. Mixing. P(H)
= P(E)PE(H) + P(~E)P~E(H).[3] The most important fact about conditional probabilities is
undoubtedly Bayes' Theorem, whose significance was first appreciated by the British cleric
Thomas Bayes in his posthumously published masterwork, "An Essay Toward Solving a
Problem in the Doctrine of Chances" (Bayes 1764). Bayes' Theorem relates the "direct"
probability of a hypothesis conditional on a given body of data, PE(H), to the "inverse"
probability of the data conditional on the hypothesis, PH(E). (1.2) Bayes' Theorem. PE(H) =
[P(H)/P(E)] PH(E) In an unfortunate, but now unavoidable, choice of terminology, statisticians
refer to the inverse probability PH(E) as the "likelihood" of H on E. It expresses the degree to
which the hypothesis predicts the data given the background information codified in the
probability P. In the example discussed above, the condition that J. Doe died during 2000 is a
fairly strong predictor of senior citizenship. Indeed, the equation PH(E) = 0.57 tells us that 57%
of the total deaths occurred among seniors that year. Bayes' theorem lets us use this information
to compute the "direct" probability of J. Doe dying given that he or she was a senior citizen. We
do this by multiplying the "prediction term" PH(E) by the ratio of the total number of deaths in
the population to the number of senior citizens in the population, P(H)/P(E) = 2.4M/16.6M =
0.144. The result is PE(H) = 0.57 × 0.144 = 0.082, just as expected. Though a mathematical
triviality, Bayes' Theorem is of great value in calculating conditional probabilities because
inverse probabilities are typically both easier to ascertain and less subjective than direct
probabilities. People with different views about the unconditional probabilities of E and H often
disagree about E's value as an indicator of H. Even so, they can agree about the degree to which
3. the hypothesis predicts the data if they know any of the following intersubjectively available
facts: (a) E's objective probability given H, (b) the frequency with which events like E will
occur if H is true, or (c) the fact that H logically entails E. Scientists often design experiments so
that likelihoods can be known in one of these "objective" ways. Bayes' Theorem then ensures
that any dispute about the significance of the experimental results can be traced to "subjective"
disagreements about the unconditional probabilities of H and E. When both PH(E) and P~H(E)
are known an experimenter need not even know E's probability to determine a value for PE(H)
using Bayes' Theorem. (1.3) Bayes' Theorem (2nd form).[4] PE(H) = P(H)PH(E) / [P(H)PH(E)
+ P(~H)P~H(E)] In this guise Bayes' theorem is particularly useful for inferring causes from
their effects since it is often fairly easy to discern the probability of an effect given the presence
or absence of a putative cause. For instance, physicians often screen for diseases of known
prevalence using diagnostic tests of recognized sensitivity and specificity. The sensitivity of a
test, its "true positive" rate, is the fraction of times that patients with the disease test positive for
it. The test's specificity, its "true negative" rate, is the proportion of healthy patients who test
negative. If we let H be the event of a given patient having the disease, and E be the event of her
testing positive for it, then the test's specificity and sensitivity are given by the likelihoods
PH(E) and P~H(~E), respectively, and the "baseline" prevalence of the disease in the
population is P(H). Given these inputs about the effects of the disease on the outcome of the test,
one can use (1.3) to determine the probability of disease given a positive test. For a more detailed
illustration of this process, see Example 1 in the Supplementary Document "Examples, Tables,
and Proof Sketches". 2. Special Forms of Bayes' Theorem Bayes' Theorem can be expressed
in a variety of forms that are useful for different purposes. One version employs what Rudolf
Carnap called the relevance quotient or probability ratio (Carnap 1962, 466). This is the factor
PR(H, E) = PE(H)/P(H) by which H's unconditional probability must be multiplied to get its
probability conditional on E. Bayes' Theorem is equivalent to a simple symmetry principle for
probability ratios. (1.4) Probability Ratio Rule. PR(H, E) = PR(E, H) The term on the right
provides one measure of the degree to which H predicts E. If we think of P(E) as expressing the
"baseline" predictability of E given the background information codified in P, and of PH(E) as
E's predictability when H is added to this background, then PR(E, H) captures the degree to
which knowing H makes E more or less predictable relative to the baseline: PR(E, H) = 0 means
that H categorically predicts ~E; PR(E, H) = 1 means that adding H does not alter the baseline
prediction at all; PR(E, H) = 1/P(E) means that H categorically predicts E. Since P(E)) = PT(E))
where T is any truth of logic, we can think of (1.4) as telling us that The probability of a
hypothesis conditional on a body of data is equal to the unconditional probability of the
hypothesis multiplied by the degree to which the hypothesis surpasses a tautology as a predictor
of the data. In our J. Doe example, PR(H, E) is obtained by comparing the predictability of
4. senior status given that J. Doe died in 2000 to its predictability given no information whatever
about his or her mortality. Dividing the former "prediction term" by the latter yields PR(H, E) =
PH(E)/P(E) = 0.57/0.06036 = 9.44. Thus, as a predictor of senior status in 2000, knowing that J.
Doe died is more than nine times better than not knowing whether she lived or died. Another
useful form of Bayes' Theorem is the Odds Rule. In the jargon of bookies, the "odds" of a
hypothesis is its probability divided by the probability of its negation: O(H) = P(H)/P(~H). So,
for example, a racehorse whose odds of winning a particular race are 7-to-5 has a 7/12 chance of
winning and a 5/12 chance of losing. To understand the difference between odds and
probabilities it helps to think of probabilities as fractions of the distance between the probability
of a contradiction and that of a tautology, so that P(H) = p means that H is p times as likely to be
true as a tautology. In contrast, writing O(H) = [P(H) - P(F)]/[P(T) - P(H)] (where F is some
logical contradiction) makes it clear that O(H) expresses this same quantity as the ratio of the
amount by which H's probability exceeds that of a contradiction to the amount by which it is
exceeded by that of a tautology. Thus, the difference between "probability talk" and "odds
talk" corresponds to the difference between saying "we are two thirds of the way there" and
saying "we have gone twice as far as we have yet to go." The analogue of the probability ratio
is the odds ratio OR(H, E) = OE(H)/O(H), the factor by which H's unconditional odds must be
multiplied to obtain its odds conditional on E. Bayes' Theorem is equivalent to the following
fact about odds ratios: (1.5) Odds Ratio Rule. OR(H, E) = PH(E)/P~H(E) Notice the similarity
between (1.4) and (1.5). While each employs a different way of expressing probabilities, each
shows how its expression for H's probability conditional on E can be obtained by multiplying its
expression for H's unconditional probability by a factor involving inverse probabilities. The
quantity LR(H, E) = PH(E)/P~H(E) that appears in (1.5) is the likelihood ratio of H given E. In
testing situations like the one described in Example 1, the likelihood ratio is the test's true
positive rate divided by its false positive rate: LR = sensitivity/(1 - specificity). As with the
probability ratio, we can construe the likelihood ratio as a measure of the degree to which H
predicts E. Instead of comparing E's probability given H with its unconditional probability,
however, we now compare it with its probability conditional on ~H. LR(H, E) is thus the degree
to which the hypothesis surpasses its negation as a predictor of the data. Once more, Bayes'
Theorem tells us how to factor conditional probabilities into unconditional probabilities and
measures of predictive power. The odds of a hypothesis conditional on a body of data is equal to
the unconditional odds of the hypothesis multiplied by the degree to which it surpasses its
negation as a predictor of the data. In our running J. Doe example, LR(H, E) is obtained by
comparing the predictability of senior status given that J. Doe died in 2000 to its predictability
given that he or she lived out the year. Dividing the former "prediction term" by the latter yields
LR(H, E) = PH(E)/P~H(E) = 0.57/0.056 = 10.12. Thus, as a predictor of senior status in 2000,
5. knowing that J. Doe died is more than ten times better than knowing that he or she lived. The
similarities between the "probability ratio" and "odds ratio" versions of Bayes' Theorem can
be developed further if we express H's probability as a multiple of the probability of some other
hypothesis H* using the relative probability function B(H, H*) = P(H)/P(H*). It should be clear
that B generalizes both P and O since P(H) = B(H, T) and O(H) = B(H, ~H). By comparing the
conditional and unconditional values of B we obtain the Bayes' Factor: BR(H, H*; E) = BE(H,
H*)/B(H, H*) = [PE(H)/PE(H*)]/ [P(H)/P(H*)]. We can also generalize the likelihood ratio by
setting LR(H, H*; E) = PH(E)/PH*(E). This compares E's predictability on the basis of H with
its predictability on the basis of H*. We can use these two quantities to formulate an even more
general form of Bayes' Theorem.