SlideShare a Scribd company logo
Course Overview
Statistics for Data Science
CSE357 - Fall 2021
Statistics for Data Science
Statistics - methods for evaluating hypotheses in the light of empirical facts
(Stanford Encyclopedia of Philosophy, 2014)
2
Statistics for Data Science
Statistics - methods for evaluating hypotheses in the light of empirical facts
(Stanford Encyclopedia of Philosophy, 2014)
Data Science - a field focused on using statistical, scientific, and computational
techniques to gain insights from data.
3
Statistics for Data Science
Statistics - methods for evaluating hypotheses in the light of empirical facts
(Stanford Encyclopedia of Philosophy, 2014)
Data Science - a field focused on using statistical, scientific, and computational
techniques to gain insights from data.
4
Computation Statistics
Science
Statistics for Data Science
Statistics - methods for evaluating hypotheses in the light of empirical facts
(Stanford Encyclopedia of Philosophy, 2014)
Data Science - a field focused on using statistical, scientific, and computational
techniques to gain insights from data.
5
Computation Statistics
Science
Statistics for Data Science
Statistics - methods for evaluating hypotheses in the light of empirical facts
(Stanford Encyclopedia of Philosophy, 2014)
Data Science - a field focused on using statistical, scientific, and computational
techniques to gain insights from data.
Approximately equal:
Data Science ≈ Data Mining ≈ Analytics ≈ Quantitative Science
Highly Related
Data Science , Big Data , Machine Learning , Artificial Intelligence
6
Statistics for Data Science
Statistical methods for gaining knowledge and insights from data.
-- designed for those already proficient in programming (i.e. computing)
7
Statistics for Data Science
Statistical methods for gaining knowledge and insights from data.
-- designed for those already proficient in programming (i.e. computing)
A pathway to knowledge about…
… what was, (past)
… what is, (present)
… what is likely (future)
8
Statistics for Data Science
Statistical methods for gaining knowledge and insights from data.
-- designed for those already proficient in programming (i.e. computing)
A pathway to knowledge about…
… what was, (past)
… what is, (present)
… what is likely (future, the full population)
9
Why?!?
Statistics for Data Science
Statistical methods for gaining knowledge and insights from data.
-- designed for those already proficient in programming (i.e. computing)
A pathway to knowledge about…
… what was, (past)
… what is, (present)
… what is likely (future)
10
Why?!?
Jobs
Statistics for Data Science
Statistical methods for gaining knowledge and insights from data.
-- designed for those already proficient in programming (i.e. computing)
A pathway to knowledge about…
… what was, (past)
… what is, (present)
… what is likely (future)
11
Why?!?
Jobs
Decisions
Statistics for Data Science
Statistical methods for gaining knowledge and insights from data.
-- designed for those already proficient in programming (i.e. computing)
A pathway to knowledge about…
… what was, (past)
… what is, (present)
… what is likely (future)
12
Why?!?
Jobs
Decisions
Truth / Meaning in Life
The answer to the "ultimate question of
life, the universe, and everything" (Adams)
In other words, so you can go on Twitter and say
"The data say …"
"I did my research."
… and change no one's mind but at least understand it better yourself.
13
Course Website
https://www3.cs.stonybrook.edu/~has/CSE357/index.html
14
Probability
Statistics for Data Science
CSE357 - Fall 2021
What is Probability?
16
What is Probability?
Examples
(1) outcome of flipping a coin
(2) amount of snowfall
(3) mentioning "happy"
(4) mentioning "happy" a lot
17
What is Probability?
The chance that something will happen.
Given infinite observations of an event, the proportion of observations where a
given outcome happens.
Strength of belief that something is true.
18
What is Probability?
The chance that something will happen.
Given infinite observations of an event, the proportion of observations where a
given outcome happens.
Strength of belief that something is true.
“Mathematical language for quantifying uncertainty” - Wasserman
19
Probability (review)
Ω : Sample Space, set of all outcomes of a random experiment
A : Event (A ⊆ Ω), collection of possible outcomes of an experiment
P(A): Probability of event A, P is a function: events→ℝ
20
Probability (review)
Ω : Sample Space, set of all outcomes of a random experiment
A : Event (A ⊆ Ω), collection of possible outcomes of an experiment
P(A): Probability of event A, P is a function: events→ℝ
(1) P(Ω) = 1
(2) P(A) ≥ 0 , for all A
(3) If A1
, A2
, … are disjoint events then:
21
Probability (review)
Ω : Sample Space, set of all outcomes of a random experiment
A : Event (A ⊆ Ω), collection of possible outcomes of an experiment
P(A): Probability of event A, P is a function: events→ℝ
P is a probability measure, if and only if
(1) P(Ω) = 1
(2) P(A) ≥ 0 , for all A
(3) If A1
, A2
, … are disjoint events then:
22
Probability (review)
Ω : Sample Space, set of all outcomes of a random experiment
A : Event (A ⊆ Ω), collection of possible outcomes of an experiment
P(A): Probability of event A, P is a function: events→ℝ
P is a probability measure, if and only if
(1) P(Ω) = 1
(2) P(A) ≥ 0 , for all A
(3) If A1
, A2
, … are disjoint events then:
23
Probability (review)
Ω : Sample Space, set of all outcomes of a random experiment
A : Event (A ⊆ Ω), collection of possible outcomes of an experiment
P(A): Probability of event A, P is a function: events→ℝ
P is a probability measure, if and only if
(1) P(Ω) = 1
(2) P(A) ≥ 0 , for all A
(3) If A1
, A2
, … are disjoint events then:
24
Examples
(1) outcome of flipping a coin
(2) amount of snowfall
(3) mentioning "happy"
(4) mentioning "happy" a lot
Probability (review)
Some Properties:
If B ⊆ A then P(A) ≥ P(B)
P(A ⋃ B) ≤ P(A) + P(B)
P(A ⋂ B) ≤ min(P(A), P(B))
P(¬A) = P(Ω / A) = 1 - P(A)
/ is set difference
P(A ⋂ B) will be notated as P(A, B)
25
Examples
(1) outcome of flipping a coin
(2) amount of snowfall
(3) mentioning "happy"
(4) mentioning "happy" a lot
Independence
Independence
Two Events: A and B
Does knowing something about A tell us whether B happens (and vice versa)?
26
Independence
Independence
Two Events: A and B
Does knowing something about A tell us whether B happens (and vice versa)?
(1) A: first flip of a fair coin; B: second flip of the same fair coin
(2) A: mention or not of the first word is “happy”
B: mention or not of the second word is “birthday”
27
Independence
Independence
Two Events: A and B
Does knowing something about A tell us whether B happens (and vice versa)?
(1) A: first flip of a fair coin; B: second flip of the same fair coin
(2) A: mention or not of the first word is “happy”
B: mention or not of the second word is “birthday”
Two events, A and B, are independent iff P(A, B) = P(A)P(B)
28
Independence
Independence
Two Events: A and B
Does knowing something about A tell us whether B happens (and vice versa)?
(1) A: first flip of a fair coin; B: second flip of the same fair coin
(2) A: mention or not of the first word is “happy”
B: mention or not of the second word is “birthday”
Two events, A and B, are independent iff P(A, B) = P(A)P(B)
29
Does dependence
imply causality?
Disjoint Sets vs. Independent Events
Independence: Two events, A and B are independence iff P(A,B) = P(A)P(B)
Disjoint Sets: If two events, A and B, come from disjoint sets, then
P(A,B) = 0
30
Disjoint Sets vs. Independent Events
Independence: … iff P(A,B) = P(A)P(B)
Disjoint Sets: If two events, A and B, come from disjoint sets, then
P(A,B) = 0
Does independence imply disjoint?
31
Disjoint Sets vs. Independent Events
Independence: … iff P(A,B) = P(A)P(B)
Disjoint Sets: If two events, A and B, come from disjoint sets, then
P(A,B) = 0
Does independence imply disjoint? No
Proof: A counterexample: ?
32
Disjoint Sets vs. Independent Events
Independence: … iff P(A,B) = P(A)P(B)
Disjoint Sets: If two events, A and B, come from disjoint sets, then
P(A,B) = 0
Does independence imply disjoint? No
Proof: A counterexample: A: flip of fair coin A is heads,
B: flip of fair boin B is heads;
independence tell us P(A)P(B) = P(A,B) = .25
but disjoint tells us P(A, B) = 0
33
A B
Probability (Review)
Conditional Probability
P(A, B)
P(A|B) = -------------
P(B)
34
Probability (Review)
Conditional Probability
P(A, B)
P(A|B) = -------------
P(B)
35
H: mention “happy” in message, m
B: mention “birthday” in message, m
P(H) = .01 P(B) =.001 P(H, B) = .0005
P(H|B) = ??
Probability (Review)
Conditional Probability
P(A, B)
P(A|B) = -------------
P(B)
36
H: mention “happy” in message, m
B: mention “birthday” in message, m
P(H) = .01 P(B) =.001 P(H, B) = .0005
P(H|B) = .50
H1: first flip of a fair coin is heads
H2: second flip of the same coin is heads
P(H2) = 0.5 P(H1) = 0.5 P(H2, H1) = 0.25
P(H2|H1) = 0.5
Probability (Review)
Conditional Probability
P(A, B)
P(A|B) = -------------
P(B)
Two events, A and B, are independent iff P(A, B) = P(A)P(B)
P(A, B) = P(A)P(B) iff P(A|B) = P(A)
37
H1: first flip of a fair coin is heads
H2: second flip of the same coin is heads
P(H2) = 0.5 P(H1) = 0.5 P(H2, H1) = 0.25
P(H2|H1) = 0.5
Probability (Review)
Conditional Probability
P(A, B)
P(A|B) = -------------
P(B)
Two events, A and B, are independent iff P(A, B) = P(A)P(B)
P(A, B) = P(A)P(B) iff P(A|B) = P(A)
Interpretation of Independence:
Observing B has no effect on probability of A. 38
H1: first flip of a fair coin is heads
H2: second flip of the same coin is heads
P(H2) = 0.5 P(H1) = 0.5 P(H2, H1) = 0.25
P(H2|H1) = 0.5
Why Probability?
39
Why Probability?
A formality to make sense of the world.
(1) To quantify uncertainty
Should we believe something or not? Is it a meaningful difference?
(2) To be able to generalize from one situation or point in time to another.
Can we rely on some information? What is the chance Y happens?
(3) To organize data into meaningful groups or “dimensions”
Where does X belong? What words are similar to X?
40
Probabilities over >2 events...
Independence:
A1
, A2
, …, An
are independent iff
41
Probabilities over >2 events...
Independence:
A1
, A2
, …, An
are independent iff
Conditional Probability:
42
Probabilities over >2 events...
Independence:
A1
, A2
, …, An
are independent iff
Conditional Probability:
just think of multiple events happening as a single event:
Z = A1
,, A2
,… , Am-1
= A1
,⋂ A2
⋂ … ⋂ Am-1
then P(Z|An
) 43
Conditional Probabilities are Fundamental to Data Science
for example
Machine Learning: Most modern deep learning techniques try to estimate
P(outcome | data)
Causal inference: Does treatment cause outcome?
P(outcome | treatment) =/= P(outcome) *
*also requires random sampling of treatment conditions
44
Conditional Independence
A and B are conditionally independent, given C, IFF
P(A, B | C) = P(A|C)P(B|C)
Equivalently, P(A|B,C) = P(A|C)
Interpretation: Once we know C, then B doesn’t tell us anything useful about A.
45
Bayes Theorem - Lite
GOAL: Relate (1) P(A|B) to (2) P(B|A)
46
Bayes Theorem - Lite
GOAL: Relate (1) P(A|B) to (2) P(B|A)
Let’s try:
(3) P(A|B) = P(A,B) / P(B), def. of conditional probability on (1)
47
Bayes Theorem - Lite
GOAL: Relate (1) P(A|B) to (2) P(B|A)
Let’s try:
(3) P(A|B) = P(A,B) / P(B), def. of conditional probability on (1)
(4) P(B|A) = P(B,A) / P(A) = P(A,B) / P(A), def. of cond prob on (2); sym of set intrsct
48
Bayes Theorem - Lite
GOAL: Relate (1) P(A|B) to (2) P(B|A)
Let’s try:
(3) P(A|B) = P(A,B) / P(B), def. of conditional probability on (1)
(4) P(B|A) = P(B,A) / P(A) = P(A,B) / P(A), def. of cond prob on (2); sym of set intrsct
(5) P(B|A)P(A) = P(A,B), algebra on (4) ← known as “Multiplication Rule”
49
Bayes Theorem - Lite
GOAL: Relate (1) P(A|B) to (2) P(B|A)
Let’s try:
(3) P(A|B) = P(A,B) / P(B), def. of conditional probability on (1)
(4) P(B|A) = P(B,A) / P(A) = P(A,B) / P(A), def. of cond prob on (2); sym of set intrsct
(5) P(B|A)P(A) = P(A,B), algebra on (4) ← known as “Multiplication Rule”
(6) P(A|B) = (P(B|A)P(A)) / P(B), Substitute P(A,B) from (5) into (3)
50
Bayes Theorem - Lite
GOAL: Relate (1) P(A|B) to (2) P(B|A)
Let’s try:
(3) P(A|B) = P(A,B) / P(B), def. of conditional probability on (1)
(4) P(B|A) = P(B,A) / P(A) = P(A,B) / P(A), def. of cond prob on (2); sym of set intrsct
(5) P(B|A)P(A) = P(A,B), algebra on (4) ← known as “Multiplication Rule”
(6) P(A|B) = (P(B|A)P(A)) / P(B), Substitute P(A,B) from (5) into (3)
51
Bayes Theorem - Lite
GOAL: Relate (1) P(A|B) to (2) P(B|A)
Let’s try:
(3) P(A|B) = P(A,B) / P(B), def. of conditional probability on (1)
(4) P(B|A) = P(B,A) / P(A) = P(A,B) / P(A), def. of cond prob on (2); sym of set intrsct
(5) P(B|A)P(A) = P(A,B), algebra on (4) ← known as “Multiplication Rule”
(6) P(A|B) = (P(B|A)P(A)) / P(B), Substitute P(A,B) from (5) into (3)
52
Why?
We often want to know P(A|B) but we
are only given P(B|A) and P(A).
Example: You want to know if an email is
likely spam given a word appearing in it:
P(spam | word). However, you only have a
dataset of words and spam: P(word | spam)
and you can look up the frequency of spam
emails in general to get P(spam) as well as the
frequency of "word" in general for P(word).
Bayes Theorem - Heavy (with multiple events partitioning Ω)
GOAL: Relate P(Ai
|B) to P(B|Ai
),
for all i = 1 ... k, where A1
... Ak
partition Ω
53
First: Law of Total Probability
GOAL: Relate P(Ai
|B) to P(B|Ai
),
for all i = 1 ... k, where A1
... Ak
partition Ω
partition: P(A1
U A2
… U Ak
) = Ω
P(Ai
, Aj
) = 0, for all i ≠ j
54
A1
A2
A3
Ak
...
First: Law of Total Probability
GOAL: Relate P(Ai
|B) to P(B|Ai
),
for all i = 1 ... k, where A1
... Ak
partition Ω
partition: P(A1
U A2
… U Ak
) = Ω
P(Ai
, Aj
) = 0, for all i ≠ j
55
A1
A2
A3
Ak
...
When both of these conditions are
true, we say "A1
, …, Ak
partition Ω"
First: Law of Total Probability
GOAL: Relate P(Ai
|B) to P(B|Ai
),
for all i = 1 ... k, where A1
... Ak
partition Ω
partition: P(A1
U A2
… U Ak
) = Ω
P(Ai
, Aj
) = 0, for all i ≠ j
law of total probability: If A1
... Ak
partition Ω,
then for any event, B:
56
A1
A2
A3
Ak
...
Law of Total Probability and Bayes Theorem
GOAL: Relate P(Ai
|B) to P(B|Ai
),
for all i = 1 ... k, where A1
... Ak
partition Ω
Let’s try:
57
Law of Total Probability
Law of Total Probability and Bayes Theorem
GOAL: Relate P(Ai
|B) to P(B|Ai
),
for all i = 1 ... k, where A1
... Ak
partition Ω
Let’s try:
(1) P(Ai
|B) = P(Ai
,B) / P(B)
(2) P(Ai
,B) / P(B) = P(B|Ai
) P(Ai
) / P(B), by multiplication rule
58
Law of Total Probability
P(A,B) = P(B|A)P(A)
Law of Total Probability and Bayes Theorem
GOAL: Relate P(Ai
|B) to P(B|Ai
),
for all i = 1 ... k, where A1
... Ak
partition Ω
Let’s try:
(1) P(Ai
|B) = P(Ai
,B) / P(B)
(2) P(Ai
,B) / P(B) = P(B|Ai
) P(Ai
) / P(B), by multiplication rule
but in practice, we might not know P(B)
59
Law of Total Probability
Law of Total Probability and Bayes Theorem
GOAL: Relate P(Ai
|B) to P(B|Ai
),
for all i = 1 ... k, where A1
... Ak
partition Ω
Let’s try:
(1) P(Ai
|B) = P(Ai
,B) / P(B)
(2) P(Ai
,B) / P(B) = P(B|Ai
) P(Ai
) / P(B), by multiplication rule
but in practice, we might not know P(B)
(3) P(B|Ai
) P(Ai
) / P(B) = P(B|Ai
) P(Ai
) / ( ), by law of total
probability
60
Law of Total Probability
Law of Total Probability and Bayes Theorem
GOAL: Relate P(Ai
|B) to P(B|Ai
),
for all i = 1 ... k, where A1
... Ak
partition Ω
Let’s try:
(1) P(Ai
|B) = P(Ai
,B) / P(B)
(2) P(Ai
,B) / P(B) = P(B|Ai
) P(Ai
) / P(B), by multiplication rule
but in practice, we might not know P(B)
(3) P(B|Ai
) P(Ai
) / P(B) = P(B|Ai
) P(Ai
) / ( ), by law of total
probability
Thus, 61
Law of Total Probability
Law of Total Probability and Bayes Theorem
GOAL: Relate P(Ai
|B) to P(B|Ai
),
for all i = 1 ... k, where A1
... Ak
partition Ω
Let’s try:
(1) P(Ai
|B) = P(Ai
,B) / P(B)
(2) P(Ai
,B) / P(B) = P(B|Ai
) P(Ai
) / P(B), by multiplication rule
but in practice, we might not know P(B)
(3) P(B|Ai
) P(Ai
) / P(B) = P(B|Ai
) P(Ai
) / ( ), by law of total
probability
Thus, 62
Law of Total Probability
Bayes Rule, in practice
Law of Total Probability and Bayes Theorem
GOAL: Relate P(Ai
|B) to P(B|Ai
),
for all i = 1 ... k, where A1
... Ak
partition Ω
Let’s try:
(1) P(Ai
|B) = P(Ai
,B) / P(B)
(2) P(Ai
,B) / P(B) = P(B|Ai
) P(Ai
) / P(B), by multiplication rule
but in practice, we might not know P(B)
(3) P(B|Ai
) P(Ai
) / P(B) = P(B|Ai
) P(Ai
) / ( ), by law of total
probability
Thus, 63
Bayes Rule, in practice
Example:
https://www.youtube.com/watch?v=R13BD8qKeTg
Probability Review:
● What constitutes a probability measure?
● Independence
● Conditional probability
● Conditional independence
● How to derive Bayes Theorem
● Multiplication Rule
● Partition of Sample Space
● Law of Total Probability
● Bayes Theorem in Practice

More Related Content

Similar to CSE357 fa21 (1) Course Intro and Probability 8-26.pdf

PTSP PPT.pdf
PTSP PPT.pdfPTSP PPT.pdf
PTSP PPT.pdf
goutamkrsahoo
 
probability and statistics
probability and statisticsprobability and statistics
probability and statistics
zain393885
 
Basic Concept Of Probability
Basic Concept Of ProbabilityBasic Concept Of Probability
Basic Concept Of Probability
guest45a926
 
lec2_CS540_handouts.pdf
lec2_CS540_handouts.pdflec2_CS540_handouts.pdf
lec2_CS540_handouts.pdf
ZineddineALICHE1
 
1 Probability Please read sections 3.1 – 3.3 in your .docx
 1 Probability   Please read sections 3.1 – 3.3 in your .docx 1 Probability   Please read sections 3.1 – 3.3 in your .docx
1 Probability Please read sections 3.1 – 3.3 in your .docx
aryan532920
 
Probability concepts for Data Analytics
Probability concepts for Data AnalyticsProbability concepts for Data Analytics
Probability concepts for Data Analytics
SSaudia
 
Basic concept of probability
Basic concept of probabilityBasic concept of probability
Basic concept of probability
Ikhlas Rahman
 
Statistical computing 1
Statistical computing 1Statistical computing 1
Statistical computing 1
Padma Metta
 
Chap03 probability
Chap03 probabilityChap03 probability
Chap03 probability
Judianto Nugroho
 
Lecture 01
Lecture 01Lecture 01
Lecture 01
guest7a67e60
 
Probability cheatsheet
Probability cheatsheetProbability cheatsheet
Probability cheatsheet
Suvrat Mishra
 
Probability-06.pdf
Probability-06.pdfProbability-06.pdf
Probability-06.pdf
AtikaAbdulhayee
 
Tp4 probability
Tp4 probabilityTp4 probability
Tp4 probability
Ishara .S. Saranapala
 
4 probability
4 probability4 probability
4 probability
Sana Marwaha
 
1-Probability-Conditional-Bayes.pdf
1-Probability-Conditional-Bayes.pdf1-Probability-Conditional-Bayes.pdf
1-Probability-Conditional-Bayes.pdf
KrushangDilipbhaiPar
 
Making probability easy!!!
Making probability easy!!!Making probability easy!!!
Making probability easy!!!
GAURAV SAHA
 
Unit IV UNCERTAINITY AND STATISTICAL REASONING in AI K.Sundar,AP/CSE,VEC
Unit IV UNCERTAINITY AND STATISTICAL REASONING in AI K.Sundar,AP/CSE,VECUnit IV UNCERTAINITY AND STATISTICAL REASONING in AI K.Sundar,AP/CSE,VEC
Unit IV UNCERTAINITY AND STATISTICAL REASONING in AI K.Sundar,AP/CSE,VEC
sundarKanagaraj1
 
S244 10 Probability.ppt
S244 10 Probability.pptS244 10 Probability.ppt
S244 10 Probability.ppt
HimanshuSharma617324
 
Probability.pptx
Probability.pptxProbability.pptx
Probability.pptx
GABBARSINGH699271
 
Reliability-Engineering.pdf
Reliability-Engineering.pdfReliability-Engineering.pdf
Reliability-Engineering.pdf
BakiyalakshmiR1
 

Similar to CSE357 fa21 (1) Course Intro and Probability 8-26.pdf (20)

PTSP PPT.pdf
PTSP PPT.pdfPTSP PPT.pdf
PTSP PPT.pdf
 
probability and statistics
probability and statisticsprobability and statistics
probability and statistics
 
Basic Concept Of Probability
Basic Concept Of ProbabilityBasic Concept Of Probability
Basic Concept Of Probability
 
lec2_CS540_handouts.pdf
lec2_CS540_handouts.pdflec2_CS540_handouts.pdf
lec2_CS540_handouts.pdf
 
1 Probability Please read sections 3.1 – 3.3 in your .docx
 1 Probability   Please read sections 3.1 – 3.3 in your .docx 1 Probability   Please read sections 3.1 – 3.3 in your .docx
1 Probability Please read sections 3.1 – 3.3 in your .docx
 
Probability concepts for Data Analytics
Probability concepts for Data AnalyticsProbability concepts for Data Analytics
Probability concepts for Data Analytics
 
Basic concept of probability
Basic concept of probabilityBasic concept of probability
Basic concept of probability
 
Statistical computing 1
Statistical computing 1Statistical computing 1
Statistical computing 1
 
Chap03 probability
Chap03 probabilityChap03 probability
Chap03 probability
 
Lecture 01
Lecture 01Lecture 01
Lecture 01
 
Probability cheatsheet
Probability cheatsheetProbability cheatsheet
Probability cheatsheet
 
Probability-06.pdf
Probability-06.pdfProbability-06.pdf
Probability-06.pdf
 
Tp4 probability
Tp4 probabilityTp4 probability
Tp4 probability
 
4 probability
4 probability4 probability
4 probability
 
1-Probability-Conditional-Bayes.pdf
1-Probability-Conditional-Bayes.pdf1-Probability-Conditional-Bayes.pdf
1-Probability-Conditional-Bayes.pdf
 
Making probability easy!!!
Making probability easy!!!Making probability easy!!!
Making probability easy!!!
 
Unit IV UNCERTAINITY AND STATISTICAL REASONING in AI K.Sundar,AP/CSE,VEC
Unit IV UNCERTAINITY AND STATISTICAL REASONING in AI K.Sundar,AP/CSE,VECUnit IV UNCERTAINITY AND STATISTICAL REASONING in AI K.Sundar,AP/CSE,VEC
Unit IV UNCERTAINITY AND STATISTICAL REASONING in AI K.Sundar,AP/CSE,VEC
 
S244 10 Probability.ppt
S244 10 Probability.pptS244 10 Probability.ppt
S244 10 Probability.ppt
 
Probability.pptx
Probability.pptxProbability.pptx
Probability.pptx
 
Reliability-Engineering.pdf
Reliability-Engineering.pdfReliability-Engineering.pdf
Reliability-Engineering.pdf
 

Recently uploaded

Atelier - Innover avec l’IA Générative et les graphes de connaissances
Atelier - Innover avec l’IA Générative et les graphes de connaissancesAtelier - Innover avec l’IA Générative et les graphes de connaissances
Atelier - Innover avec l’IA Générative et les graphes de connaissances
Neo4j
 
AI Fusion Buddy Review: Brand New, Groundbreaking Gemini-Powered AI App
AI Fusion Buddy Review: Brand New, Groundbreaking Gemini-Powered AI AppAI Fusion Buddy Review: Brand New, Groundbreaking Gemini-Powered AI App
AI Fusion Buddy Review: Brand New, Groundbreaking Gemini-Powered AI App
Google
 
openEuler Case Study - The Journey to Supply Chain Security
openEuler Case Study - The Journey to Supply Chain SecurityopenEuler Case Study - The Journey to Supply Chain Security
openEuler Case Study - The Journey to Supply Chain Security
Shane Coughlan
 
Oracle Database 19c New Features for DBAs and Developers.pptx
Oracle Database 19c New Features for DBAs and Developers.pptxOracle Database 19c New Features for DBAs and Developers.pptx
Oracle Database 19c New Features for DBAs and Developers.pptx
Remote DBA Services
 
A Study of Variable-Role-based Feature Enrichment in Neural Models of Code
A Study of Variable-Role-based Feature Enrichment in Neural Models of CodeA Study of Variable-Role-based Feature Enrichment in Neural Models of Code
A Study of Variable-Role-based Feature Enrichment in Neural Models of Code
Aftab Hussain
 
Introducing Crescat - Event Management Software for Venues, Festivals and Eve...
Introducing Crescat - Event Management Software for Venues, Festivals and Eve...Introducing Crescat - Event Management Software for Venues, Festivals and Eve...
Introducing Crescat - Event Management Software for Venues, Festivals and Eve...
Crescat
 
Hand Rolled Applicative User Validation Code Kata
Hand Rolled Applicative User ValidationCode KataHand Rolled Applicative User ValidationCode Kata
Hand Rolled Applicative User Validation Code Kata
Philip Schwarz
 
What is Augmented Reality Image Tracking
What is Augmented Reality Image TrackingWhat is Augmented Reality Image Tracking
What is Augmented Reality Image Tracking
pavan998932
 
What is Master Data Management by PiLog Group
What is Master Data Management by PiLog GroupWhat is Master Data Management by PiLog Group
What is Master Data Management by PiLog Group
aymanquadri279
 
Energy consumption of Database Management - Florina Jonuzi
Energy consumption of Database Management - Florina JonuziEnergy consumption of Database Management - Florina Jonuzi
Energy consumption of Database Management - Florina Jonuzi
Green Software Development
 
Oracle 23c New Features For DBAs and Developers.pptx
Oracle 23c New Features For DBAs and Developers.pptxOracle 23c New Features For DBAs and Developers.pptx
Oracle 23c New Features For DBAs and Developers.pptx
Remote DBA Services
 
UI5con 2024 - Keynote: Latest News about UI5 and it’s Ecosystem
UI5con 2024 - Keynote: Latest News about UI5 and it’s EcosystemUI5con 2024 - Keynote: Latest News about UI5 and it’s Ecosystem
UI5con 2024 - Keynote: Latest News about UI5 and it’s Ecosystem
Peter Muessig
 
Using Xen Hypervisor for Functional Safety
Using Xen Hypervisor for Functional SafetyUsing Xen Hypervisor for Functional Safety
Using Xen Hypervisor for Functional Safety
Ayan Halder
 
Neo4j - Product Vision and Knowledge Graphs - GraphSummit Paris
Neo4j - Product Vision and Knowledge Graphs - GraphSummit ParisNeo4j - Product Vision and Knowledge Graphs - GraphSummit Paris
Neo4j - Product Vision and Knowledge Graphs - GraphSummit Paris
Neo4j
 
Revolutionizing Visual Effects Mastering AI Face Swaps.pdf
Revolutionizing Visual Effects Mastering AI Face Swaps.pdfRevolutionizing Visual Effects Mastering AI Face Swaps.pdf
Revolutionizing Visual Effects Mastering AI Face Swaps.pdf
Undress Baby
 
Neo4j - Product Vision and Knowledge Graphs - GraphSummit Paris
Neo4j - Product Vision and Knowledge Graphs - GraphSummit ParisNeo4j - Product Vision and Knowledge Graphs - GraphSummit Paris
Neo4j - Product Vision and Knowledge Graphs - GraphSummit Paris
Neo4j
 
Vitthal Shirke Java Microservices Resume.pdf
Vitthal Shirke Java Microservices Resume.pdfVitthal Shirke Java Microservices Resume.pdf
Vitthal Shirke Java Microservices Resume.pdf
Vitthal Shirke
 
Empowering Growth with Best Software Development Company in Noida - Deuglo
Empowering Growth with Best Software  Development Company in Noida - DeugloEmpowering Growth with Best Software  Development Company in Noida - Deuglo
Empowering Growth with Best Software Development Company in Noida - Deuglo
Deuglo Infosystem Pvt Ltd
 
2024 eCommerceDays Toulouse - Sylius 2.0.pdf
2024 eCommerceDays Toulouse - Sylius 2.0.pdf2024 eCommerceDays Toulouse - Sylius 2.0.pdf
2024 eCommerceDays Toulouse - Sylius 2.0.pdf
Łukasz Chruściel
 
E-Invoicing Implementation: A Step-by-Step Guide for Saudi Arabian Companies
E-Invoicing Implementation: A Step-by-Step Guide for Saudi Arabian CompaniesE-Invoicing Implementation: A Step-by-Step Guide for Saudi Arabian Companies
E-Invoicing Implementation: A Step-by-Step Guide for Saudi Arabian Companies
Quickdice ERP
 

Recently uploaded (20)

Atelier - Innover avec l’IA Générative et les graphes de connaissances
Atelier - Innover avec l’IA Générative et les graphes de connaissancesAtelier - Innover avec l’IA Générative et les graphes de connaissances
Atelier - Innover avec l’IA Générative et les graphes de connaissances
 
AI Fusion Buddy Review: Brand New, Groundbreaking Gemini-Powered AI App
AI Fusion Buddy Review: Brand New, Groundbreaking Gemini-Powered AI AppAI Fusion Buddy Review: Brand New, Groundbreaking Gemini-Powered AI App
AI Fusion Buddy Review: Brand New, Groundbreaking Gemini-Powered AI App
 
openEuler Case Study - The Journey to Supply Chain Security
openEuler Case Study - The Journey to Supply Chain SecurityopenEuler Case Study - The Journey to Supply Chain Security
openEuler Case Study - The Journey to Supply Chain Security
 
Oracle Database 19c New Features for DBAs and Developers.pptx
Oracle Database 19c New Features for DBAs and Developers.pptxOracle Database 19c New Features for DBAs and Developers.pptx
Oracle Database 19c New Features for DBAs and Developers.pptx
 
A Study of Variable-Role-based Feature Enrichment in Neural Models of Code
A Study of Variable-Role-based Feature Enrichment in Neural Models of CodeA Study of Variable-Role-based Feature Enrichment in Neural Models of Code
A Study of Variable-Role-based Feature Enrichment in Neural Models of Code
 
Introducing Crescat - Event Management Software for Venues, Festivals and Eve...
Introducing Crescat - Event Management Software for Venues, Festivals and Eve...Introducing Crescat - Event Management Software for Venues, Festivals and Eve...
Introducing Crescat - Event Management Software for Venues, Festivals and Eve...
 
Hand Rolled Applicative User Validation Code Kata
Hand Rolled Applicative User ValidationCode KataHand Rolled Applicative User ValidationCode Kata
Hand Rolled Applicative User Validation Code Kata
 
What is Augmented Reality Image Tracking
What is Augmented Reality Image TrackingWhat is Augmented Reality Image Tracking
What is Augmented Reality Image Tracking
 
What is Master Data Management by PiLog Group
What is Master Data Management by PiLog GroupWhat is Master Data Management by PiLog Group
What is Master Data Management by PiLog Group
 
Energy consumption of Database Management - Florina Jonuzi
Energy consumption of Database Management - Florina JonuziEnergy consumption of Database Management - Florina Jonuzi
Energy consumption of Database Management - Florina Jonuzi
 
Oracle 23c New Features For DBAs and Developers.pptx
Oracle 23c New Features For DBAs and Developers.pptxOracle 23c New Features For DBAs and Developers.pptx
Oracle 23c New Features For DBAs and Developers.pptx
 
UI5con 2024 - Keynote: Latest News about UI5 and it’s Ecosystem
UI5con 2024 - Keynote: Latest News about UI5 and it’s EcosystemUI5con 2024 - Keynote: Latest News about UI5 and it’s Ecosystem
UI5con 2024 - Keynote: Latest News about UI5 and it’s Ecosystem
 
Using Xen Hypervisor for Functional Safety
Using Xen Hypervisor for Functional SafetyUsing Xen Hypervisor for Functional Safety
Using Xen Hypervisor for Functional Safety
 
Neo4j - Product Vision and Knowledge Graphs - GraphSummit Paris
Neo4j - Product Vision and Knowledge Graphs - GraphSummit ParisNeo4j - Product Vision and Knowledge Graphs - GraphSummit Paris
Neo4j - Product Vision and Knowledge Graphs - GraphSummit Paris
 
Revolutionizing Visual Effects Mastering AI Face Swaps.pdf
Revolutionizing Visual Effects Mastering AI Face Swaps.pdfRevolutionizing Visual Effects Mastering AI Face Swaps.pdf
Revolutionizing Visual Effects Mastering AI Face Swaps.pdf
 
Neo4j - Product Vision and Knowledge Graphs - GraphSummit Paris
Neo4j - Product Vision and Knowledge Graphs - GraphSummit ParisNeo4j - Product Vision and Knowledge Graphs - GraphSummit Paris
Neo4j - Product Vision and Knowledge Graphs - GraphSummit Paris
 
Vitthal Shirke Java Microservices Resume.pdf
Vitthal Shirke Java Microservices Resume.pdfVitthal Shirke Java Microservices Resume.pdf
Vitthal Shirke Java Microservices Resume.pdf
 
Empowering Growth with Best Software Development Company in Noida - Deuglo
Empowering Growth with Best Software  Development Company in Noida - DeugloEmpowering Growth with Best Software  Development Company in Noida - Deuglo
Empowering Growth with Best Software Development Company in Noida - Deuglo
 
2024 eCommerceDays Toulouse - Sylius 2.0.pdf
2024 eCommerceDays Toulouse - Sylius 2.0.pdf2024 eCommerceDays Toulouse - Sylius 2.0.pdf
2024 eCommerceDays Toulouse - Sylius 2.0.pdf
 
E-Invoicing Implementation: A Step-by-Step Guide for Saudi Arabian Companies
E-Invoicing Implementation: A Step-by-Step Guide for Saudi Arabian CompaniesE-Invoicing Implementation: A Step-by-Step Guide for Saudi Arabian Companies
E-Invoicing Implementation: A Step-by-Step Guide for Saudi Arabian Companies
 

CSE357 fa21 (1) Course Intro and Probability 8-26.pdf

  • 1. Course Overview Statistics for Data Science CSE357 - Fall 2021
  • 2. Statistics for Data Science Statistics - methods for evaluating hypotheses in the light of empirical facts (Stanford Encyclopedia of Philosophy, 2014) 2
  • 3. Statistics for Data Science Statistics - methods for evaluating hypotheses in the light of empirical facts (Stanford Encyclopedia of Philosophy, 2014) Data Science - a field focused on using statistical, scientific, and computational techniques to gain insights from data. 3
  • 4. Statistics for Data Science Statistics - methods for evaluating hypotheses in the light of empirical facts (Stanford Encyclopedia of Philosophy, 2014) Data Science - a field focused on using statistical, scientific, and computational techniques to gain insights from data. 4 Computation Statistics Science
  • 5. Statistics for Data Science Statistics - methods for evaluating hypotheses in the light of empirical facts (Stanford Encyclopedia of Philosophy, 2014) Data Science - a field focused on using statistical, scientific, and computational techniques to gain insights from data. 5 Computation Statistics Science
  • 6. Statistics for Data Science Statistics - methods for evaluating hypotheses in the light of empirical facts (Stanford Encyclopedia of Philosophy, 2014) Data Science - a field focused on using statistical, scientific, and computational techniques to gain insights from data. Approximately equal: Data Science ≈ Data Mining ≈ Analytics ≈ Quantitative Science Highly Related Data Science , Big Data , Machine Learning , Artificial Intelligence 6
  • 7. Statistics for Data Science Statistical methods for gaining knowledge and insights from data. -- designed for those already proficient in programming (i.e. computing) 7
  • 8. Statistics for Data Science Statistical methods for gaining knowledge and insights from data. -- designed for those already proficient in programming (i.e. computing) A pathway to knowledge about… … what was, (past) … what is, (present) … what is likely (future) 8
  • 9. Statistics for Data Science Statistical methods for gaining knowledge and insights from data. -- designed for those already proficient in programming (i.e. computing) A pathway to knowledge about… … what was, (past) … what is, (present) … what is likely (future, the full population) 9 Why?!?
  • 10. Statistics for Data Science Statistical methods for gaining knowledge and insights from data. -- designed for those already proficient in programming (i.e. computing) A pathway to knowledge about… … what was, (past) … what is, (present) … what is likely (future) 10 Why?!? Jobs
  • 11. Statistics for Data Science Statistical methods for gaining knowledge and insights from data. -- designed for those already proficient in programming (i.e. computing) A pathway to knowledge about… … what was, (past) … what is, (present) … what is likely (future) 11 Why?!? Jobs Decisions
  • 12. Statistics for Data Science Statistical methods for gaining knowledge and insights from data. -- designed for those already proficient in programming (i.e. computing) A pathway to knowledge about… … what was, (past) … what is, (present) … what is likely (future) 12 Why?!? Jobs Decisions Truth / Meaning in Life The answer to the "ultimate question of life, the universe, and everything" (Adams)
  • 13. In other words, so you can go on Twitter and say "The data say …" "I did my research." … and change no one's mind but at least understand it better yourself. 13
  • 15. Probability Statistics for Data Science CSE357 - Fall 2021
  • 17. What is Probability? Examples (1) outcome of flipping a coin (2) amount of snowfall (3) mentioning "happy" (4) mentioning "happy" a lot 17
  • 18. What is Probability? The chance that something will happen. Given infinite observations of an event, the proportion of observations where a given outcome happens. Strength of belief that something is true. 18
  • 19. What is Probability? The chance that something will happen. Given infinite observations of an event, the proportion of observations where a given outcome happens. Strength of belief that something is true. “Mathematical language for quantifying uncertainty” - Wasserman 19
  • 20. Probability (review) Ω : Sample Space, set of all outcomes of a random experiment A : Event (A ⊆ Ω), collection of possible outcomes of an experiment P(A): Probability of event A, P is a function: events→ℝ 20
  • 21. Probability (review) Ω : Sample Space, set of all outcomes of a random experiment A : Event (A ⊆ Ω), collection of possible outcomes of an experiment P(A): Probability of event A, P is a function: events→ℝ (1) P(Ω) = 1 (2) P(A) ≥ 0 , for all A (3) If A1 , A2 , … are disjoint events then: 21
  • 22. Probability (review) Ω : Sample Space, set of all outcomes of a random experiment A : Event (A ⊆ Ω), collection of possible outcomes of an experiment P(A): Probability of event A, P is a function: events→ℝ P is a probability measure, if and only if (1) P(Ω) = 1 (2) P(A) ≥ 0 , for all A (3) If A1 , A2 , … are disjoint events then: 22
  • 23. Probability (review) Ω : Sample Space, set of all outcomes of a random experiment A : Event (A ⊆ Ω), collection of possible outcomes of an experiment P(A): Probability of event A, P is a function: events→ℝ P is a probability measure, if and only if (1) P(Ω) = 1 (2) P(A) ≥ 0 , for all A (3) If A1 , A2 , … are disjoint events then: 23
  • 24. Probability (review) Ω : Sample Space, set of all outcomes of a random experiment A : Event (A ⊆ Ω), collection of possible outcomes of an experiment P(A): Probability of event A, P is a function: events→ℝ P is a probability measure, if and only if (1) P(Ω) = 1 (2) P(A) ≥ 0 , for all A (3) If A1 , A2 , … are disjoint events then: 24 Examples (1) outcome of flipping a coin (2) amount of snowfall (3) mentioning "happy" (4) mentioning "happy" a lot
  • 25. Probability (review) Some Properties: If B ⊆ A then P(A) ≥ P(B) P(A ⋃ B) ≤ P(A) + P(B) P(A ⋂ B) ≤ min(P(A), P(B)) P(¬A) = P(Ω / A) = 1 - P(A) / is set difference P(A ⋂ B) will be notated as P(A, B) 25 Examples (1) outcome of flipping a coin (2) amount of snowfall (3) mentioning "happy" (4) mentioning "happy" a lot
  • 26. Independence Independence Two Events: A and B Does knowing something about A tell us whether B happens (and vice versa)? 26
  • 27. Independence Independence Two Events: A and B Does knowing something about A tell us whether B happens (and vice versa)? (1) A: first flip of a fair coin; B: second flip of the same fair coin (2) A: mention or not of the first word is “happy” B: mention or not of the second word is “birthday” 27
  • 28. Independence Independence Two Events: A and B Does knowing something about A tell us whether B happens (and vice versa)? (1) A: first flip of a fair coin; B: second flip of the same fair coin (2) A: mention or not of the first word is “happy” B: mention or not of the second word is “birthday” Two events, A and B, are independent iff P(A, B) = P(A)P(B) 28
  • 29. Independence Independence Two Events: A and B Does knowing something about A tell us whether B happens (and vice versa)? (1) A: first flip of a fair coin; B: second flip of the same fair coin (2) A: mention or not of the first word is “happy” B: mention or not of the second word is “birthday” Two events, A and B, are independent iff P(A, B) = P(A)P(B) 29 Does dependence imply causality?
  • 30. Disjoint Sets vs. Independent Events Independence: Two events, A and B are independence iff P(A,B) = P(A)P(B) Disjoint Sets: If two events, A and B, come from disjoint sets, then P(A,B) = 0 30
  • 31. Disjoint Sets vs. Independent Events Independence: … iff P(A,B) = P(A)P(B) Disjoint Sets: If two events, A and B, come from disjoint sets, then P(A,B) = 0 Does independence imply disjoint? 31
  • 32. Disjoint Sets vs. Independent Events Independence: … iff P(A,B) = P(A)P(B) Disjoint Sets: If two events, A and B, come from disjoint sets, then P(A,B) = 0 Does independence imply disjoint? No Proof: A counterexample: ? 32
  • 33. Disjoint Sets vs. Independent Events Independence: … iff P(A,B) = P(A)P(B) Disjoint Sets: If two events, A and B, come from disjoint sets, then P(A,B) = 0 Does independence imply disjoint? No Proof: A counterexample: A: flip of fair coin A is heads, B: flip of fair boin B is heads; independence tell us P(A)P(B) = P(A,B) = .25 but disjoint tells us P(A, B) = 0 33 A B
  • 34. Probability (Review) Conditional Probability P(A, B) P(A|B) = ------------- P(B) 34
  • 35. Probability (Review) Conditional Probability P(A, B) P(A|B) = ------------- P(B) 35 H: mention “happy” in message, m B: mention “birthday” in message, m P(H) = .01 P(B) =.001 P(H, B) = .0005 P(H|B) = ??
  • 36. Probability (Review) Conditional Probability P(A, B) P(A|B) = ------------- P(B) 36 H: mention “happy” in message, m B: mention “birthday” in message, m P(H) = .01 P(B) =.001 P(H, B) = .0005 P(H|B) = .50 H1: first flip of a fair coin is heads H2: second flip of the same coin is heads P(H2) = 0.5 P(H1) = 0.5 P(H2, H1) = 0.25 P(H2|H1) = 0.5
  • 37. Probability (Review) Conditional Probability P(A, B) P(A|B) = ------------- P(B) Two events, A and B, are independent iff P(A, B) = P(A)P(B) P(A, B) = P(A)P(B) iff P(A|B) = P(A) 37 H1: first flip of a fair coin is heads H2: second flip of the same coin is heads P(H2) = 0.5 P(H1) = 0.5 P(H2, H1) = 0.25 P(H2|H1) = 0.5
  • 38. Probability (Review) Conditional Probability P(A, B) P(A|B) = ------------- P(B) Two events, A and B, are independent iff P(A, B) = P(A)P(B) P(A, B) = P(A)P(B) iff P(A|B) = P(A) Interpretation of Independence: Observing B has no effect on probability of A. 38 H1: first flip of a fair coin is heads H2: second flip of the same coin is heads P(H2) = 0.5 P(H1) = 0.5 P(H2, H1) = 0.25 P(H2|H1) = 0.5
  • 40. Why Probability? A formality to make sense of the world. (1) To quantify uncertainty Should we believe something or not? Is it a meaningful difference? (2) To be able to generalize from one situation or point in time to another. Can we rely on some information? What is the chance Y happens? (3) To organize data into meaningful groups or “dimensions” Where does X belong? What words are similar to X? 40
  • 41. Probabilities over >2 events... Independence: A1 , A2 , …, An are independent iff 41
  • 42. Probabilities over >2 events... Independence: A1 , A2 , …, An are independent iff Conditional Probability: 42
  • 43. Probabilities over >2 events... Independence: A1 , A2 , …, An are independent iff Conditional Probability: just think of multiple events happening as a single event: Z = A1 ,, A2 ,… , Am-1 = A1 ,⋂ A2 ⋂ … ⋂ Am-1 then P(Z|An ) 43
  • 44. Conditional Probabilities are Fundamental to Data Science for example Machine Learning: Most modern deep learning techniques try to estimate P(outcome | data) Causal inference: Does treatment cause outcome? P(outcome | treatment) =/= P(outcome) * *also requires random sampling of treatment conditions 44
  • 45. Conditional Independence A and B are conditionally independent, given C, IFF P(A, B | C) = P(A|C)P(B|C) Equivalently, P(A|B,C) = P(A|C) Interpretation: Once we know C, then B doesn’t tell us anything useful about A. 45
  • 46. Bayes Theorem - Lite GOAL: Relate (1) P(A|B) to (2) P(B|A) 46
  • 47. Bayes Theorem - Lite GOAL: Relate (1) P(A|B) to (2) P(B|A) Let’s try: (3) P(A|B) = P(A,B) / P(B), def. of conditional probability on (1) 47
  • 48. Bayes Theorem - Lite GOAL: Relate (1) P(A|B) to (2) P(B|A) Let’s try: (3) P(A|B) = P(A,B) / P(B), def. of conditional probability on (1) (4) P(B|A) = P(B,A) / P(A) = P(A,B) / P(A), def. of cond prob on (2); sym of set intrsct 48
  • 49. Bayes Theorem - Lite GOAL: Relate (1) P(A|B) to (2) P(B|A) Let’s try: (3) P(A|B) = P(A,B) / P(B), def. of conditional probability on (1) (4) P(B|A) = P(B,A) / P(A) = P(A,B) / P(A), def. of cond prob on (2); sym of set intrsct (5) P(B|A)P(A) = P(A,B), algebra on (4) ← known as “Multiplication Rule” 49
  • 50. Bayes Theorem - Lite GOAL: Relate (1) P(A|B) to (2) P(B|A) Let’s try: (3) P(A|B) = P(A,B) / P(B), def. of conditional probability on (1) (4) P(B|A) = P(B,A) / P(A) = P(A,B) / P(A), def. of cond prob on (2); sym of set intrsct (5) P(B|A)P(A) = P(A,B), algebra on (4) ← known as “Multiplication Rule” (6) P(A|B) = (P(B|A)P(A)) / P(B), Substitute P(A,B) from (5) into (3) 50
  • 51. Bayes Theorem - Lite GOAL: Relate (1) P(A|B) to (2) P(B|A) Let’s try: (3) P(A|B) = P(A,B) / P(B), def. of conditional probability on (1) (4) P(B|A) = P(B,A) / P(A) = P(A,B) / P(A), def. of cond prob on (2); sym of set intrsct (5) P(B|A)P(A) = P(A,B), algebra on (4) ← known as “Multiplication Rule” (6) P(A|B) = (P(B|A)P(A)) / P(B), Substitute P(A,B) from (5) into (3) 51
  • 52. Bayes Theorem - Lite GOAL: Relate (1) P(A|B) to (2) P(B|A) Let’s try: (3) P(A|B) = P(A,B) / P(B), def. of conditional probability on (1) (4) P(B|A) = P(B,A) / P(A) = P(A,B) / P(A), def. of cond prob on (2); sym of set intrsct (5) P(B|A)P(A) = P(A,B), algebra on (4) ← known as “Multiplication Rule” (6) P(A|B) = (P(B|A)P(A)) / P(B), Substitute P(A,B) from (5) into (3) 52 Why? We often want to know P(A|B) but we are only given P(B|A) and P(A). Example: You want to know if an email is likely spam given a word appearing in it: P(spam | word). However, you only have a dataset of words and spam: P(word | spam) and you can look up the frequency of spam emails in general to get P(spam) as well as the frequency of "word" in general for P(word).
  • 53. Bayes Theorem - Heavy (with multiple events partitioning Ω) GOAL: Relate P(Ai |B) to P(B|Ai ), for all i = 1 ... k, where A1 ... Ak partition Ω 53
  • 54. First: Law of Total Probability GOAL: Relate P(Ai |B) to P(B|Ai ), for all i = 1 ... k, where A1 ... Ak partition Ω partition: P(A1 U A2 … U Ak ) = Ω P(Ai , Aj ) = 0, for all i ≠ j 54 A1 A2 A3 Ak ...
  • 55. First: Law of Total Probability GOAL: Relate P(Ai |B) to P(B|Ai ), for all i = 1 ... k, where A1 ... Ak partition Ω partition: P(A1 U A2 … U Ak ) = Ω P(Ai , Aj ) = 0, for all i ≠ j 55 A1 A2 A3 Ak ... When both of these conditions are true, we say "A1 , …, Ak partition Ω"
  • 56. First: Law of Total Probability GOAL: Relate P(Ai |B) to P(B|Ai ), for all i = 1 ... k, where A1 ... Ak partition Ω partition: P(A1 U A2 … U Ak ) = Ω P(Ai , Aj ) = 0, for all i ≠ j law of total probability: If A1 ... Ak partition Ω, then for any event, B: 56 A1 A2 A3 Ak ...
  • 57. Law of Total Probability and Bayes Theorem GOAL: Relate P(Ai |B) to P(B|Ai ), for all i = 1 ... k, where A1 ... Ak partition Ω Let’s try: 57 Law of Total Probability
  • 58. Law of Total Probability and Bayes Theorem GOAL: Relate P(Ai |B) to P(B|Ai ), for all i = 1 ... k, where A1 ... Ak partition Ω Let’s try: (1) P(Ai |B) = P(Ai ,B) / P(B) (2) P(Ai ,B) / P(B) = P(B|Ai ) P(Ai ) / P(B), by multiplication rule 58 Law of Total Probability P(A,B) = P(B|A)P(A)
  • 59. Law of Total Probability and Bayes Theorem GOAL: Relate P(Ai |B) to P(B|Ai ), for all i = 1 ... k, where A1 ... Ak partition Ω Let’s try: (1) P(Ai |B) = P(Ai ,B) / P(B) (2) P(Ai ,B) / P(B) = P(B|Ai ) P(Ai ) / P(B), by multiplication rule but in practice, we might not know P(B) 59 Law of Total Probability
  • 60. Law of Total Probability and Bayes Theorem GOAL: Relate P(Ai |B) to P(B|Ai ), for all i = 1 ... k, where A1 ... Ak partition Ω Let’s try: (1) P(Ai |B) = P(Ai ,B) / P(B) (2) P(Ai ,B) / P(B) = P(B|Ai ) P(Ai ) / P(B), by multiplication rule but in practice, we might not know P(B) (3) P(B|Ai ) P(Ai ) / P(B) = P(B|Ai ) P(Ai ) / ( ), by law of total probability 60 Law of Total Probability
  • 61. Law of Total Probability and Bayes Theorem GOAL: Relate P(Ai |B) to P(B|Ai ), for all i = 1 ... k, where A1 ... Ak partition Ω Let’s try: (1) P(Ai |B) = P(Ai ,B) / P(B) (2) P(Ai ,B) / P(B) = P(B|Ai ) P(Ai ) / P(B), by multiplication rule but in practice, we might not know P(B) (3) P(B|Ai ) P(Ai ) / P(B) = P(B|Ai ) P(Ai ) / ( ), by law of total probability Thus, 61 Law of Total Probability
  • 62. Law of Total Probability and Bayes Theorem GOAL: Relate P(Ai |B) to P(B|Ai ), for all i = 1 ... k, where A1 ... Ak partition Ω Let’s try: (1) P(Ai |B) = P(Ai ,B) / P(B) (2) P(Ai ,B) / P(B) = P(B|Ai ) P(Ai ) / P(B), by multiplication rule but in practice, we might not know P(B) (3) P(B|Ai ) P(Ai ) / P(B) = P(B|Ai ) P(Ai ) / ( ), by law of total probability Thus, 62 Law of Total Probability Bayes Rule, in practice
  • 63. Law of Total Probability and Bayes Theorem GOAL: Relate P(Ai |B) to P(B|Ai ), for all i = 1 ... k, where A1 ... Ak partition Ω Let’s try: (1) P(Ai |B) = P(Ai ,B) / P(B) (2) P(Ai ,B) / P(B) = P(B|Ai ) P(Ai ) / P(B), by multiplication rule but in practice, we might not know P(B) (3) P(B|Ai ) P(Ai ) / P(B) = P(B|Ai ) P(Ai ) / ( ), by law of total probability Thus, 63 Bayes Rule, in practice Example: https://www.youtube.com/watch?v=R13BD8qKeTg
  • 64. Probability Review: ● What constitutes a probability measure? ● Independence ● Conditional probability ● Conditional independence ● How to derive Bayes Theorem ● Multiplication Rule ● Partition of Sample Space ● Law of Total Probability ● Bayes Theorem in Practice