Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.
Bayesian Decision Theory
Prof. Dr. Mostafa Gadal-Haqq
Faculty of Computer & Information Sciences
Computer Science Departme...
2.1 Bayesian Decision Theory
• Bayesian Decision Theory is based on
quantifying the trade-offs between various
classificat...
2.1 Bayesian Decision Theory
• Back to the Fish Sorting Machine:
–  = a random variable (State of nature)={1 ,2}
• For ...
• Decision Rule Using Priors only:
– to make a decision about the fish that will
appear using only the priors, P(1) and P...
• That is:
– If P(1) >> P(2) we will be right most of the
time when we decide that the fish belong to 1 .
– If P(1) = ...
• Improving the decision using observation:
2.1 Bayesian Decision Theory
• If we know the class –
conditional probability,...
• We can improve our decision by using this
observed feature and the Bayes rule :
– Posterior = (Likelihood x Prior) / Evi...
• Bayesian decision is based on minimizing the
probability of error , i.e. for a given feature
value x :
• The probability...
fish (x)  2
Suppose P(1)=2/3=0.67, and P(2)=1/3= 0.33 ,
2.1 Bayesian Decision Theory: Numerical Example
P(x|w2)
P(x|w1...
2.1 Bayesian Decision Theory
Computing
for all values
of x gives
decision
regions
(Rules) :
R2 R2R1 R1
• if x  R1
decide ...
• Draw Probability Densities and find the
decision regions for the following Classes:
 = {1, 2},
P(x | 1) ~ N(20, 4),
...
2.2 General Bayesian Decision Theory
• Generalization of Bayesian decision theory is
done by allowing the following:
– Hav...
• Allowing actions other than classification
primarily allows the possibility of rejection
• Rejection is refusing to make...
• Suppose we have c states of nature (categories)
 = { 1, 2,…, c } ,
• a feature vector:
x = { x1, x2,…, xd } ,
• the ...
• The conditional risk, R(i | x), for select the action
i is given by:




cj
j
jjii xPxR
1
)|()|()|( 
• The Ov...
Take action i (i.e. decide i)
if R(i | x) < R(j | x) ;  j and j  i.
The Bayesian decision rule becomes: select
the a...
• Minimizing R(i | x) for all actions, that is: for
all i ; i = 1,…, a, is minimizing R.
• The overall risk R is the “ex...
• Two-category classification Example:
Suppose we have two categories {1 ,2} and two
actions {1 ,2 }, where:
1 : deci...
decide 1 (i.e. 1) if R(1 | x) < R(2 | x)
and 2 (i.e. 2) if R(1 | x) > R(2 | x)
There are a variety of ways to expr...
2- The rule in terms of the posteriors is:
3- The rule in terms of the priors and conditional
densities is:
decide 1 if (...
4- The rule in terms of the likelihoods ratios:
That is, the Bayes (Optimal) decision can be
interpreted as:
2.2 General B...
• Decision regions depends on the values of the loss
function:
• For different loss function  we have:
)(
)(2
then
01
20
...
2.2 General Bayesian Decision Theory
ASU-CSC446 : Pattern Recognition. Prof. Dr. Mostafa Gadal-Haqq slide - 23
2.3 Minimum-Error Rate Classification
• Consider the zero-one (or symmetrical) loss
function:
• Therefore, the conditional...
The Minmax Criterion
• Sometimes we need to design our classifier to
perform well over a range of prior probabilities, or
...
The Minmax Criterion
• It is found that the overall risk is linear in P(ωj).
Then, when the constant of proportionality (t...
The Minmax Criterion
ASU-CSC446 : Pattern Recognition. Prof. Dr. Mostafa Gadal-Haqq slide - 27
The Neyman-Pearson Criterion
• The Neynam-Pearson Criterion:
“minimize the overall risk subject to a
constraint”
• General...
• Computer Exercises:
– Find the optimal decision for the following data:
 = {1, 2},
p(x | 1) ~ N(20, 4),
p(x | 2) ~ ...
Upcoming SlideShare
Loading in …5
×

CSC446: Pattern Recognition (LN4)

400 views

Published on

Lecture 4: Bayesian Decision Theory (Part 1)

Published in: Data & Analytics
  • Hello! Get Your Professional Job-Winning Resume Here - Check our website! https://vk.cc/818RFv
       Reply 
    Are you sure you want to  Yes  No
    Your message goes here

CSC446: Pattern Recognition (LN4)

  1. 1. Bayesian Decision Theory Prof. Dr. Mostafa Gadal-Haqq Faculty of Computer & Information Sciences Computer Science Department AIN SHAMS UNIVERSITY ASU-CSC446 : Pattern Recognition. Prof. Dr. Mostafa Gadal-Haqq slide - 1 CSC446 : Pattern Recognition (Pattern Classifications, Ch2: Sec. 2.1 to Sec. 2.3)
  2. 2. 2.1 Bayesian Decision Theory • Bayesian Decision Theory is based on quantifying the trade-offs between various classification decisions using probabilities and the costs that accompany such decisions. • Assumes that: The decision problem is posed in probabilistic terms and that all of the relevant probability values are known. ASU-CSC446 : Pattern Recognition. Prof. Dr. Mostafa Gadal-Haqq slide - 2
  3. 3. 2.1 Bayesian Decision Theory • Back to the Fish Sorting Machine: –  = a random variable (State of nature)={1 ,2} • For example: 1 = Sea bass, and 2 = Salmon • P(1 ) = the prior (a priori probability) that the coming fish is sea bass. • P(2 ) = the prior (a priori probability) that the coming fish is salmon. – The priors gives us the knowledge of how likely we are to get salmon or Sea bass before the fish actually appears. ASU-CSC446 : Pattern Recognition. Prof. Dr. Mostafa Gadal-Haqq slide - 3
  4. 4. • Decision Rule Using Priors only: – to make a decision about the fish that will appear using only the priors, P(1) and P(2), We use the following decision rule: – which minimize the error. 2.1 Bayesian Decision Theory Decide fish 1 if P(1) > P(2) and fish 2 if P(1) < P(2) Probability of error = min [ P(1) , P(2)] ASU-CSC446 : Pattern Recognition. Prof. Dr. Mostafa Gadal-Haqq slide - 4
  5. 5. • That is: – If P(1) >> P(2) we will be right most of the time when we decide that the fish belong to 1 . – If P(1) = P(2) we have only fifty-fifty chance of being right. – Under these conditions, no other decision rules can yield a larger probability of being right. 2.1 Bayesian Decision Theory ASU-CSC446 : Pattern Recognition. Prof. Dr. Mostafa Gadal-Haqq slide - 5
  6. 6. • Improving the decision using observation: 2.1 Bayesian Decision Theory • If we know the class – conditional probability, P(x | j), of an observation x, we could improve our decision. • for example: x describes the observed lightness of the sea bass or salmon P(x|w2) P(x|w1) ASU-CSC446 : Pattern Recognition. Prof. Dr. Mostafa Gadal-Haqq slide - 6
  7. 7. • We can improve our decision by using this observed feature and the Bayes rule : – Posterior = (Likelihood x Prior) / Evidence – Where, for C categories :     Cj j jj PxPxP 1 )()|()(  2.1 Bayesian Decision Theory )( )()|( )|( xP PxP xP jj j    ASU-CSC446 : Pattern Recognition. Prof. Dr. Mostafa Gadal-Haqq slide - 7
  8. 8. • Bayesian decision is based on minimizing the probability of error , i.e. for a given feature value x : • The probability of error for a particular x is : 2.1 Bayesian Decision Theory Decide x 1 if P(1 | x) > P(2 | x) and x 2 if P(1 | x) < P(2 | x) P(error | x) = min [ P(1 | x), P(2 | x) ] ASU-CSC446 : Pattern Recognition. Prof. Dr. Mostafa Gadal-Haqq slide - 8
  9. 9. fish (x)  2 Suppose P(1)=2/3=0.67, and P(2)=1/3= 0.33 , 2.1 Bayesian Decision Theory: Numerical Example P(x|w2) P(x|w1) 0.36 0.15 If x = 11.5, then P(x|1)= 0.15 , P(x|2)= 0.36 P(x) = 0.15*0.67 + 0.36*0.33 = 0.22 P(1|x)= 0.15*0.67/0.22 = 0.46 P(2|x)= 0.36*0.33/0.22 = 0.54 fish  1 ASU-CSC446 : Pattern Recognition. Prof. Dr. Mostafa Gadal-Haqq slide - 9
  10. 10. 2.1 Bayesian Decision Theory Computing for all values of x gives decision regions (Rules) : R2 R2R1 R1 • if x  R1 decide 1 • if x  R2 decide 2 ASU-CSC446 : Pattern Recognition. Prof. Dr. Mostafa Gadal-Haqq slide - 10
  11. 11. • Draw Probability Densities and find the decision regions for the following Classes:  = {1, 2}, P(x | 1) ~ N(20, 4), P(x | 2) ~ N(15, 2), P(1) = 1/3, and P(2) = 2/3, – Then Classify a sample with feature value x= 17. Assignment 2.1 ASU-CSC446 : Pattern Recognition. Prof. Dr. Mostafa Gadal-Haqq slide - 11
  12. 12. 2.2 General Bayesian Decision Theory • Generalization of Bayesian decision theory is done by allowing the following: – Having more than one feature. – Having more than two states of nature. – Allowing actions and not only decide on the state of nature. – Introduce a loss of function which is more general than the probability of error. ASU-CSC446 : Pattern Recognition. Prof. Dr. Mostafa Gadal-Haqq slide - 12
  13. 13. • Allowing actions other than classification primarily allows the possibility of rejection • Rejection is refusing to make decision in close or bad cases! • The loss function states: how costly each action taken is? 2.2 General Bayesian Decision Theory ASU-CSC446 : Pattern Recognition. Prof. Dr. Mostafa Gadal-Haqq slide - 13
  14. 14. • Suppose we have c states of nature (categories)  = { 1, 2,…, c } , • a feature vector: x = { x1, x2,…, xd } , • the possible actions  = { 1, 2,…, a } , • and the loss, (i | j ), incurred for taking action i when the state of nature is j . 2.2 General Bayesian Decision Theory ASU-CSC446 : Pattern Recognition. Prof. Dr. Mostafa Gadal-Haqq slide - 14
  15. 15. • The conditional risk, R(i | x), for select the action i is given by:     cj j jjii xPxR 1 )|()|()|(  • The Overall risk, R, is the Sum of all Conditional risks R(i | x) for i = 1,…,a.     ai i i xRR 1 )|( 2.2 General Bayesian Decision Theory ASU-CSC446 : Pattern Recognition. Prof. Dr. Mostafa Gadal-Haqq slide - 15
  16. 16. Take action i (i.e. decide i) if R(i | x) < R(j | x) ;  j and j  i. The Bayesian decision rule becomes: select the action i for which the conditional risk, R(i | x), is minimum. That is : 2.2 General Bayesian Decision Theory ASU-CSC446 : Pattern Recognition. Prof. Dr. Mostafa Gadal-Haqq slide - 16
  17. 17. • Minimizing R(i | x) for all actions, that is: for all i ; i = 1,…, a, is minimizing R. • The overall risk R is the “expected loss associated with a given decision rule”. • The overall risk R is called the Bayes risk, which defines the best performance that can be achieved! 2.2 General Bayesian Decision Theory ASU-CSC446 : Pattern Recognition. Prof. Dr. Mostafa Gadal-Haqq slide - 17
  18. 18. • Two-category classification Example: Suppose we have two categories {1 ,2} and two actions {1 ,2 }, where: 1 : deciding 1 , and 2 : deciding 2 , and for simplicity we write ij = (i | j ) The conditional risks for taking 1 and 2 are: R(1 | x) = 11P(1 | x) + 12P(2 | x) R(2 | x) = 21P(1 | x) + 22P(2 | x) 2.2 General Bayesian Decision Theory ASU-CSC446 : Pattern Recognition. Prof. Dr. Mostafa Gadal-Haqq slide - 18
  19. 19. decide 1 (i.e. 1) if R(1 | x) < R(2 | x) and 2 (i.e. 2) if R(1 | x) > R(2 | x) There are a variety of ways to express the minimum-risk rule, each has its advantage: 1- The fundamental rule is: 2.2 General Bayesian Decision Theory ASU-CSC446 : Pattern Recognition. Prof. Dr. Mostafa Gadal-Haqq slide - 19
  20. 20. 2- The rule in terms of the posteriors is: 3- The rule in terms of the priors and conditional densities is: decide 1 if (21- 11) P(1 | x ) > (12- 22) P(2 | x ) decide 2 otherwise 2.2 General Bayesian Decision Theory decide 1 if (21- 11) P(x | 1 ) P(1) > (12- 22) P(x | 2 ) P(2 ) decide 2 otherwise ASU-CSC446 : Pattern Recognition. Prof. Dr. Mostafa Gadal-Haqq slide - 20
  21. 21. 4- The rule in terms of the likelihoods ratios: That is, the Bayes (Optimal) decision can be interpreted as: 2.2 General Bayesian Decision Theory decide 1 if decide 2 otherwise )( )( . )|( )|( 1 2 1121 2212 2 1       P P xp xp    “One can take an optimal decision, if the likelihood ratio exceeds a threshold value that is independent of the observation x” ASU-CSC446 : Pattern Recognition. Prof. Dr. Mostafa Gadal-Haqq slide - 21
  22. 22. • Decision regions depends on the values of the loss function: • For different loss function  we have: )( )(2 then 01 20 if )( )( then 01 10 if 1 2 1 2       P P P P b a                            )|( )|( :ifdecidethen )( )( .Let 2 1 1 1 2 1121 2212 xp xp P P 2.2 General Bayesian Decision Theory ASU-CSC446 : Pattern Recognition. Prof. Dr. Mostafa Gadal-Haqq slide - 22
  23. 23. 2.2 General Bayesian Decision Theory ASU-CSC446 : Pattern Recognition. Prof. Dr. Mostafa Gadal-Haqq slide - 23
  24. 24. 2.3 Minimum-Error Rate Classification • Consider the zero-one (or symmetrical) loss function: • Therefore, the conditional risk is: • In other words, for symmetric loss function, the conditional risk is the probability of error. cji ji ji ji ,...,1, 1 0 ),(               1j ij cj 1j jjii )x|(P1)x|(P )x|(P)|()x|(R ASU-CSC446 : Pattern Recognition. Prof. Dr. Mostafa Gadal-Haqq slide - 24
  25. 25. The Minmax Criterion • Sometimes we need to design our classifier to perform well over a range of prior probabilities, or where we do not know the prior probabilities. • A reasonable approach is to design our classifier so that the worst overall risk for any value of the priors is as small as possible • Minimax Criterion: “minimize the maximum possible overall risk” ASU-CSC446 : Pattern Recognition. Prof. Dr. Mostafa Gadal-Haqq slide - 25
  26. 26. The Minmax Criterion • It is found that the overall risk is linear in P(ωj). Then, when the constant of proportionality (the slope) is zero, the risk is independent of priors. This condition gives the minmax risk Rmm as: ASU-CSC446 : Pattern Recognition. Prof. Dr. Mostafa Gadal-Haqq slide - 26
  27. 27. The Minmax Criterion ASU-CSC446 : Pattern Recognition. Prof. Dr. Mostafa Gadal-Haqq slide - 27
  28. 28. The Neyman-Pearson Criterion • The Neynam-Pearson Criterion: “minimize the overall risk subject to a constraint” • Generally Neyman-Pearson criterion is satisfied by adjusting decision boundaries numerically. However, for Gaussian and some other distributions, its solution can be found analytically.  R(αi|x) dx < constant ASU-CSC446 : Pattern Recognition. Prof. Dr. Mostafa Gadal-Haqq slide - 28
  29. 29. • Computer Exercises: – Find the optimal decision for the following data:  = {1, 2}, p(x | 1) ~ N(20, 4), p(x | 2) ~ N(15, 2), P(1) = 2/3, and P(2) = 1/3, – With a loss function: – Then classify the samples: x = 12, 17, 18, and 20.        12 .511  Assignment 2.2 ASU-CSC446 : Pattern Recognition. Prof. Dr. Mostafa Gadal-Haqq slide - 29

×