2. Why Learn Probability?
• Statistics, data mining, and machine learning are all concerned with
collecting and analyzing data.
• Using fancy tools like neural nets, boosting and support vector machines
without understanding statistics is like doing brain surgery before knowing
how to use a band-aid.
• Prediction, classification, clustering, and estimation are all special cases
of statistical inference.
• Data analysis, machine learning, and data mining are various names
given to the practice of statistical inference, depending on the context.
– Typeset by FoilTEX – 1
3. Probability
Data generating process Observed data
Inference and Data mining
• Probability theory is the formal language of uncertainty, which is the
basis of statistical inference.
• The basic problem that we study in probability is: Given a data generating
process, what are the properties of the outcomes?
• The basic problem of statistical inference is the inverse of probability :
Given the outcomes, what can we say about the process that generated
the data?
– Typeset by FoilTEX – 2
4. Introduction
• Any realistic model of a real-world phenomenon must take into account
the possibility of randomness.
• The quantities will not be predicted in advanced but will exbihit an
inherent variation that should be taken into account by the model
• Probability model: allowing the model to be probabilistic in nature
– Typeset by FoilTEX – 3
5. Sample Space and Events
• Assumed an experiment whose outcome is not predictable in advance
• Suppose that the set of all possible outcomes is known
• This set is known as the sample space of the experiment, denoted by S
Example 1. If the experiment consists of the flipping of a coin,
S = {Head, T ail}
Example 2. If the experiment consists of tossing a die, then the sample
space is
– Typeset by FoilTEX – 4
6. S = {1, 2, 3, 4, 5, 6}
• Any subset E of the sample space S is known as an event.
Example 3. If E = {Head} then E is the event that a head appears on
the flip of the coin.
Example 4. If E = T then E would be the event that a tail appears.
– Typeset by FoilTEX – 5
7. Probabilities Defined on Events
Definition 1. Given an experiment with sample space S, for each event
E of the sample space S, P (E) is the probability of the event E and it
satisfies the following three conditions:
Axiom 1 0 ≤ P (E) ≤ 1
Axiom 2 P (S) = 1
Axiom 3 ∀ E1, E2, . . ., En ∩ Em = m = n, then
∞ ∞
P( En) = P (En)
n=1 n=1
– Typeset by FoilTEX – 6
8. Example 5. In the coin tossing example, if we assume that a head is
equally likely to appear as a tail, then we would have
1
P ({H}) = P ({T }) =
2
On the other hand, if we had a biased coin and felt that a head was
twice as likely to appear as a tail, then we would have
2 1
P ({H}) = , P ({T }) =
3 3
– Typeset by FoilTEX – 7
9. Conditional Probabilities
• Suppose that we observe that the sum of the first die is a 4. What is
the probability that the sum of the two dice equals 6?
• Let E and F denote respectively the event that the sum of the dice is 6
and the event that the first die is 4
• If the event F occurs, then in order for E to occur it is necessary for the
actual occurrence to be a point in both E and in F , it must be in EF .
• Once F has occurred, F becomes our new sample space; hence the
probability that the event EF occurs will equal the probability of EF
relative to the probability of F
– Typeset by FoilTEX – 8
10. Definition 2. If P (B) > 0 then the conditional probability of E given F
is
P (EF )
P (E|F ) =
P (F )
Example 6. Suppose cards numbered 1 through 10 are placed in a hat,
mixed up, and then one of the cards is drawn. If we are told that the number
on the drawn card is at least 5, then what is the conditional probability that
it is 10?
Example 7. Suppose that each of 3 men at a party thrwos his hat into
the center of the room. The hats are first mixed up and then each man
randomly selects a hat. What is the probability that none of the three men
selects his own hat?
– Typeset by FoilTEX – 9
11. Independent Events
Definition 3. Two events E and F are said to be independent if
P (EF ) = P (E)P (F )
A set of events {Ai : i ∈ S} is independent if
P Ai = P (Ai)
i∈Ω i∈Ω
for every finite subset Ω of S.
– Typeset by FoilTEX – 10
12. Example 8. Suppose we toss two fair dice. Let E1 denote the event that
the sum of the dice is six and F denote the event that the first die equals
four. Then
1
P (E1F ) = P ({4, 2}) =
36
while
51 5
P (E1)P (F ) = =
36 6 216
and hence E1 and F are not independent. Why? Let E2 be the event
that the sum of the dice equals seven. Is E2 independent of F ?
– Typeset by FoilTEX – 11
13. Bayes’ Formula
Let E and F be events. We may express E as
E = EF EF c, in whichF c = S − F
for in order for a point to be in E, it must either be in both E and F .
Since EF and EF c are mutually exclusive, we have
P (E) = P (EF ) + P (EF c) (1)
= P (E|F )P (F ) + P (E|F c)P (F c) (2)
= P (E|F )P (F ) + P (E|F c)(1 − P (F )) (3)
– Typeset by FoilTEX – 12
14. Example 9. Consider two urns. The first contains 2 white and 7 black
balls, and the second contains 5 whites and 6 black balls. We flip a fair
coin and then draw a ball from the first urn or the second urn depending on
whether the outcome was heads or tails. What is the conditional probability
that the outcome of the toss was heads given that a white ball was seleted?
– Typeset by FoilTEX – 13
15. Bayes’ Formula: General case
n
• Suppose F1, F2, . . . , Fn are mutually exclusive events s.t i=1 Fi = S.
• Exactly one of the events F1, F2, . . . , Fn will occur.
n
• By writing E = i=1 EFi and using the fact that EFi, i = 1, . . . , n are
mutually exclusive, we obtain
n
P (E) = EFi (4)
i=1
n
= P (E|Fi)P (Fi) (5)
i=1
– Typeset by FoilTEX – 14
16. Suppose E has occurred and we are interested in determining which one
of the Fj also occurred. From Equation 4 we have that
P (EFj )
P (Fj |E) = (6)
P (E)
P (E|Fj )P (Fj )
= n (7)
i=1 P (E|Fi )P (Fi )
This equation is also known as Bayes’ formula.
– Typeset by FoilTEX – 15