Gaussian Bayes Classifiers

969 views

Published on

Published in: Technology, Education
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total views
969
On SlideShare
0
From Embeds
0
Number of Embeds
2
Actions
Shares
0
Downloads
38
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide

Gaussian Bayes Classifiers

  1. 1. Note to other teachers and users of these slides. Andrew would be delighted if you found this source material useful in giving you r own lectures. Feel free to use these slides verbatim, or to modify them to fit your own needs. PowerPoint originals are available. If you make use of a significant portion of these slides in your own lecture, please include this message, or the following link to the source repository of Andrew’s tutorials: http://www.cs.cmu.edu/~awm/tutorials . Comments and corrections gratefully received. Learning Gaussian Bayes Classifiers Andrew W. Moore Associate Professor School of Computer Science Carnegie Mellon University www.cs.cmu.edu/~awm awm@cs.cmu.edu 412-268-7599 Copyright © 2001, Andrew W. Moore Sep 10th, 2001 Maximum Likelihood learning of Gaussians for Classification • Why we should care • 3 seconds to teach you a new learning algorithm • What if there are 10,000 dimensions? • What if there are categorical inputs? • Examples “out the wazoo” Copyright © 2001, Andrew W. Moore Gaussian Bayes Classifiers: Slide 2 1
  2. 2. Why we should care • One of the original “Data Mining” algorithms • Very simple and effective • Demonstrates the usefulness of our earlier groundwork Copyright © 2001, Andrew W. Moore Gaussian Bayes Classifiers: Slide 3 Where we were at the end of the MLE lecture… Categorical Real-valued Mixed Real / inputs only inputs only Cat okay Joint BC Predict Inputs Dec Tree Classifier category Naïve BC Joint DE Gauss DE Prob- Inputs Inputs Density ability Estimator Naïve DE Predict Regressor real no. Copyright © 2001, Andrew W. Moore Gaussian Bayes Classifiers: Slide 4 2
  3. 3. This lecture… Categorical Real-valued Mixed Real / inputs only inputs only Cat okay Joint BC Gauss BC Predict Inputs Dec Tree Classifier category Naïve BC Joint DE Gauss DE Prob- Inputs Inputs Density ability Estimator Naïve DE Predict Regressor real no. Copyright © 2001, Andrew W. Moore Gaussian Bayes Classifiers: Slide 5 Road Map Probability Decision Trees Density PDFs Estimation Gaussians Bayes MLE Classifiers MLE of Gaussians Copyright © 2001, Andrew W. Moore Gaussian Bayes Classifiers: Slide 6 3
  4. 4. Road Map Probability Decision Trees Density PDFs Estimation Gaussians Bayes MLE Classifiers MLE of Gaussians Gaussian Bayes Classifiers Copyright © 2001, Andrew W. Moore Gaussian Bayes Classifiers: Slide 7 Gaussian Bayes Classifier Assumption • The i’th record in the database is created using the following algorithm 1. Generate the output (the “class”) by drawing yi~Multinomial(p1,p2,…p Ny ) 2. Generate the inputs from a Gaussian PDF that depends on the value of yi : xi ~ N(µ i ,Σ i). Test your understanding. Given Ny classes and m input attributes, how many distinct scalar parameters need to be estimated? Copyright © 2001, Andrew W. Moore Gaussian Bayes Classifiers: Slide 8 4
  5. 5. MLE Gaussian Bayes Classifier Let DB = Subset of |DB | • The i i’th record in the p mle = ------is created databasei database DB in which i using classfollowing algorithm the is y = i the output |DB| 1. Generate the output (the “class”) by drawing yi~Multinomial(p1,p2,…p Ny ) 2. Generate the inputs from a Gaussian PDF that depends on the value of yi : xi ~ N(µ i ,Σ i). Test your understanding. Given Ny classes and m input attributes, how many distinct scalar parameters need to be estimated? Copyright © 2001, Andrew W. Moore Gaussian Bayes Classifiers: Slide 9 MLE Gaussian Bayes Classifier Let DB = Subset of • The i i’th record in the database is created database DB in which the output class following algorithm using the is y = i 1. mle mle the output (the “class”) by Generate (µ i , Σ i )= MLE Gaussian for DBi drawing yi~Multinomial(p1,p2,…p Ny ) 2. Generate the inputs from a Gaussian PDF that depends on the value of yi : xi ~ N(µ i ,Σ i). Test your understanding. Given Ny classes and m input attributes, how many distinct scalar parameters need to be estimated? Copyright © 2001, Andrew W. Moore Gaussian Bayes Classifiers: Slide 10 5
  6. 6. MLE Gaussian Bayes Classifier Let DB = Subset of • The i i’th record in the database is created database DB in which the output class following algorithm using the is y = i 1. mle mle the output (the “class”) by Generate (µ i , Σ i )= MLE Gaussian for DBi drawing yi~Multinomial(p1,p2,…p Ny ) 2. Generate the inputs from a Gaussian PDF that depends on the value of yi : xi ~ N(µ i ,Σ i). R ( )( ) 1 ∑ R 1 ∑ x k input x k − µimle µ mle = T Test your| understanding. Given Ny classes and m− µ i attributes, how Si = xk mle mle i | DB i x k ∈DB i | DB i | x ∈DB many distinct scalar parameters need to be estimated? k i Copyright © 2001, Andrew W. Moore Gaussian Bayes Classifiers: Slide 11 Gaussian Bayes Classification p (x | y = i) P ( y = i) P ( y = i | x) = p( x) Copyright © 2001, Andrew W. Moore Gaussian Bayes Classifiers: Slide 12 6
  7. 7. Gaussian Bayes Classification p (x | y = i) P ( y = i) P ( y = i | x) = p( x) 1  exp  − (x k − µ i ) S i (x k − µ i ) p i 1 T ( 2π ) 2  m /2 1/2 || S i || P ( y = i | x) = p (x ) How do we deal with that? Copyright © 2001, Andrew W. Moore Gaussian Bayes Classifiers: Slide 13 Here is a dataset age employment education edunum marital … job relation race gender hours_worked wealth country … 39 State_gov Bachelors 13 Never_married … Adm_clericalNot_in_family White Male 40 United_States poor 51 Self_emp_not_inc Bachelors 13 Married … Exec_managerial Husband White Male 13 United_States poor 39 Private HS_grad 9 Divorced … Handlers_cleaners Not_in_family White Male 40 United_States poor 54 Private 11th 7 Married … Handlers_cleaners Husband Black Male 40 United_States poor 28 Private Bachelors 13 Married … Prof_specialty Wife Black Female 40 Cuba poor 38 Private Masters 14 Married … Exec_managerial Wife White Female 40 United_States poor 50 Private 9th 5 Married_spouse_absent … Other_service Not_in_family Black Female 16 Jamaica poor 52 Self_emp_not_inc HS_grad 9 Married … Exec_managerial Husband White Male 45 United_States rich 31 Private Masters 14 Never_married … Prof_specialty Not_in_family White Female 50 United_States rich 42 Private Bachelors 13 Married … Exec_managerial Husband White Male 40 United_States rich 37 Private Some_college0 1 Married … Exec_managerial Husband Black Male 80 United_States rich 30 State_gov Bachelors 13 Married … Prof_specialty Husband Asian Male 40 India rich 24 Private Bachelors 13 Never_married … Adm_clericalOwn_child White Female 30 United_States poor 33 Private Assoc_acdm12 Never_married … Sales Not_in_family Black Male 50 United_States poor 41 Private Assoc_voc 11 Married … Craft_repairHusband Asian Male 40 *MissingValue* rich 34 Private 7th_8th 4 Married … Transport_moving Husband Amer_Indian Male 45 Mexico poor 26 Self_emp_not_inc HS_grad 9 Never_married … Farming_fishing Own_child White Male 35 United_States poor 33 Private HS_grad 9 Never_married … Machine_op_inspct White Unmarried Male 40 United_States poor 38 Private 11th 7 Married … Sales Husband White Male 50 United_States poor 44 Self_emp_not_inc Masters 14 Divorced … Exec_managerial Unmarried White Female 45 United_States rich 41 Private Doctorate 16 Married … Prof_specialty Husband White Male 60 United_States rich : : : : : : : : : : : : : 48,000 records, 16 attributes [Kohavi 1995] Copyright © 2001, Andrew W. Moore Gaussian Bayes Classifiers: Slide 14 7
  8. 8. Predicting wealth from age Copyright © 2001, Andrew W. Moore Gaussian Bayes Classifiers: Slide 15 Predicting wealth from age Copyright © 2001, Andrew W. Moore Gaussian Bayes Classifiers: Slide 16 8
  9. 9. Wealth from hours worked Copyright © 2001, Andrew W. Moore Gaussian Bayes Classifiers: Slide 17 Wealth from years of education Copyright © 2001, Andrew W. Moore Gaussian Bayes Classifiers: Slide 18 9
  10. 10. age, hours → wealth Copyright © 2001, Andrew W. Moore Gaussian Bayes Classifiers: Slide 19 age, hours → wealth Copyright © 2001, Andrew W. Moore Gaussian Bayes Classifiers: Slide 20 10
  11. 11. age, hours → wealth Having 2 inputs instead of one helps in two ways: 1. Combining evidence from two 1d Gaussians 2. Off-diagonal covariance distinguishes class “shape” Copyright © 2001, Andrew W. Moore Gaussian Bayes Classifiers: Slide 21 age, hours → wealth Having 2 inputs instead of one helps in two ways: 1. Combining evidence from two 1d Gaussians 2. Off-diagonal covariance distinguishes class “shape” Copyright © 2001, Andrew W. Moore Gaussian Bayes Classifiers: Slide 22 11
  12. 12. age, edunum → wealth Copyright © 2001, Andrew W. Moore Gaussian Bayes Classifiers: Slide 23 age, edunum → wealth Copyright © 2001, Andrew W. Moore Gaussian Bayes Classifiers: Slide 24 12
  13. 13. hours, edunum → wealth Copyright © 2001, Andrew W. Moore Gaussian Bayes Classifiers: Slide 25 hours, edunum → wealth Copyright © 2001, Andrew W. Moore Gaussian Bayes Classifiers: Slide 26 13
  14. 14. Accuracy Copyright © 2001, Andrew W. Moore Gaussian Bayes Classifiers: Slide 27 An “MPG” example Copyright © 2001, Andrew W. Moore Gaussian Bayes Classifiers: Slide 28 14
  15. 15. An “MPG” example Copyright © 2001, Andrew W. Moore Gaussian Bayes Classifiers: Slide 29 An “MPG” example Things to note: •Class Boundaries can be weird shapes (hyperconic sections) •Class regions can be non-simply- connected •But it’s impossible to model arbitrarily weirdly shaped regions •Test your understanding: With one input, must classes be simply connected? Copyright © 2001, Andrew W. Moore Gaussian Bayes Classifiers: Slide 30 15
  16. 16. Overfitting dangers • Problem with “Joint” Bayes classifier: #parameters exponential with #dimensions. This means we just memorize the training data, and can overfit. Copyright © 2001, Andrew W. Moore Gaussian Bayes Classifiers: Slide 31 Overfitting dangers • Problem with “Joint” Bayes classifier: #parameters exponential with #dimensions. This means we just memorize the training data, and can overfit. • Problemette with Gaussian Bayes classifier: #parameters quadratic with #dimensions. With 10,000 dimensions and only 1,000 datapoints we could overfit. Question: Any suggested solutions? Copyright © 2001, Andrew W. Moore Gaussian Bayes Classifiers: Slide 32 16
  17. 17. General: O(m2)  σ 21 L σ 1m  σ 12   σ σ 22 L σ 2m  parameters S =  12  M M O M σ L σ 2m  σ 2m  1m  Copyright © 2001, Andrew W. Moore Gaussian Bayes Classifiers: Slide 33 General: O(m2)  σ 21 L σ 1m  σ 12   σ σ 22 L σ 2m  parameters S =  12  M M O M σ L σ 2m  σ 2m  1m  Copyright © 2001, Andrew W. Moore Gaussian Bayes Classifiers: Slide 34 17
  18. 18.  σ 21 0 L 0 0 0 Aligned: O(m)    0 σ 22 0 L 0 0 0 0 σ 23 L 0 0 parameters S =  M M M M O M   L σ 2 m−1  0 0 0 0 0 2 σ m L  0 0 0 Copyright © 2001, Andrew W. Moore Gaussian Bayes Classifiers: Slide 35  σ 21 0 L 0 0 0 Aligned: O(m)    0 σ 22 0 L 0 0 0 0 σ 23 L 0 0 parameters S =  M M M M O M   L σ 2 m−1  0 0 0 0 0 σ 2m L   0 0 0 Copyright © 2001, Andrew W. Moore Gaussian Bayes Classifiers: Slide 36 18
  19. 19. σ 2 0 0 L 0 0 Spherical: O(1)   0 σ 0 2 L 0 0 0 0 0 σ2 L cov parameters 0 S =  M M M M O M   L σ2  0 0 0 0 0 σ 2 L   0 0 0 Copyright © 2001, Andrew W. Moore Gaussian Bayes Classifiers: Slide 37 σ 2 0 0 L 0 0 Spherical: O(1)    0 σ2 0 0 L 0 0 0 0 σ2 L cov parameters 0 S =  M M M M O M   L σ2 0 0 0 0 0 σ 2 L   0 0 0 Copyright © 2001, Andrew W. Moore Gaussian Bayes Classifiers: Slide 38 19
  20. 20. BCs that have both real and categorical inputs? Categorical Real-valued Mixed Real / inputs only inputs only Cat okay Joint BC Gauss BC Predict Inputs Dec Tree Classifier category Naïve BC BC Here??? Joint DE Gauss DE Prob- Inputs Inputs Density ability Estimator Naïve DE Predict Regressor real no. Copyright © 2001, Andrew W. Moore Gaussian Bayes Classifiers: Slide 39 BCs that have both real and categorical inputs? Categorical Real-valued Mixed Real / inputs only inputs only Cat okay Joint BC Gauss BC Predict Inputs Dec Tree Classifier category Naïve BC BC Here??? Joint DE Gauss DE Prob- Inputs Inputs Density ability Estimator Naïve DE Easy! Predict Regressor real no. Guess how? Copyright © 2001, Andrew W. Moore Gaussian Bayes Classifiers: Slide 40 20
  21. 21. BCs that have both real and categorical inputs? Categorical Real-valued Mixed Real / inputs only inputs only Cat okay Joint BC Gauss BC Dec Tree Predict Inputs Classifier Gauss/Joint BC category Naïve BC Gauss Naïve BC Joint DE Gauss DE Gauss/Joint DE Prob- Inputs Inputs Density ability Estimator Naïve DE Gauss DE Gauss Naïve DE Predict Regressor real no. Copyright © 2001, Andrew W. Moore Gaussian Bayes Classifiers: Slide 41 BCs that have both real and categorical inputs? Categorical Real-valued Mixed Real / inputs only inputs only Cat okay Joint BC Gauss BC Dec Tree Predict Inputs Classifier Gauss/Joint BC category Naïve BC Gauss Naïve BC Joint DE Gauss DE Gauss/Joint DE Prob- Inputs Inputs Density ability Estimator Naïve DE Gauss DE Gauss Naïve DE Predict Regressor real no. Copyright © 2001, Andrew W. Moore Gaussian Bayes Classifiers: Slide 42 21
  22. 22. Mixed Categorical / Real Density Estimation • Write x = (u,v) = (u1 ,u2 ,…uq ,v1 ,v2 … vm-q) Real valued Categorical valued P(x |M)= P(u,v |M) (where M is any Density Estimation Model) Copyright © 2001, Andrew W. Moore Gaussian Bayes Classifiers: Slide 43 Joint / Gauss DE sty ich ta Combo re wh … Not su oy? Try our nj e DE to P(u,v |M) = P(u |v ,M) P(v |M) Gaussian with Big “m-q”-dimensional parameters lookup table depending on v Copyright © 2001, Andrew W. Moore Gaussian Bayes Classifiers: Slide 44 22
  23. 23. MLE learning of the Joint / Gauss DE Combo P(u,v |M) = P(u |v ,M) P(v |M) µ v = Mean of u among records matching v Σ v = Cov. of u among records matching v qv = Fraction of records that match v u |v ,M ~ N(µv , Σv ) , P(v |M) = qv Copyright © 2001, Andrew W. Moore Gaussian Bayes Classifiers: Slide 45 MLE learning of the Joint / Gauss DE Combo P(u,v |M) = P(u |v ,M) P(v |M) 1 µ v = Mean of u among = ∑ uk records matching v R v k s.t. v k = v Σ v = Cov. of u among =1 ∑ =(u k − µ v )(u k − µ v ) T records matching v R v k s.t.v k v qv = Fraction of records = Rv that match v R R = # records that match v v u |v ,M ~ N(µv , Σv ) , P(v |M) = qv Copyright © 2001, Andrew W. Moore Gaussian Bayes Classifiers: Slide 46 23
  24. 24. Gender and Hours Worked* *As with all the results from the UCI “adult census” dataset, we can’t draw any real-world conclusions since it’s such a non-real-world sample Copyright © 2001, Andrew W. Moore Gaussian Bayes Classifiers: Slide 47 Joint / Gauss DE What we just did Combo Copyright © 2001, Andrew W. Moore Gaussian Bayes Classifiers: Slide 48 24
  25. 25. Joint / Gauss BC What we do next Combo Copyright © 2001, Andrew W. Moore Gaussian Bayes Classifiers: Slide 49 Joint / Gauss BC Combo p (u , v | M i ) P (Y = i ) P (Y = i | u , v ) = p (u , v ) p (u, | v , M i ) p ( v | M i ) P (Y = i ) = p (u , v ) N (u; µ i , v , S i , v ) q i , v pi = p (u, v ) Copyright © 2001, Andrew W. Moore Gaussian Bayes Classifiers: Slide 50 25
  26. 26. Joint / Gauss BC Combo p (u , v | M i ) P (Y = i ) P (Y = i | u , v ) = p (u , v ) µ i,v = Mean of u among records matching v = p (u, | v , M i ) p ( v | M i ) P (Y = i ) and in which y=i p (u , v ) Σ i,v = Cov. of u among records matching v N (u; µ i , v , S i , v ) q i , v pi and in which y=i = qi,v = Fraction of “y=i” p (u, v ) records that match Rather so-so-notation for v “Gaussian with mean µ i,v and pi = Fraction of records covariance Σ i,v evaluated at u” that match “y=i” Copyright © 2001, Andrew W. Moore Gaussian Bayes Classifiers: Slide 51 Gender, Hours→Wealth Copyright © 2001, Andrew W. Moore Gaussian Bayes Classifiers: Slide 52 26
  27. 27. Gender, Hours→Wealth Copyright © 2001, Andrew W. Moore Gaussian Bayes Classifiers: Slide 53 Joint / Gauss DE Combo and Joint / Gauss BC Combo: The downside • (Yawn…we’ve done this before…) More than a few categorical attributes blah blah blah massive table blah blah lots of parameters blah blah just memorize training data blah blah blah do worse on future data blah blah need to be more conservative blah Copyright © 2001, Andrew W. Moore Gaussian Bayes Classifiers: Slide 54 27
  28. 28. Naïve/Gauss combo for Density Estimation Categorical Real q  m − q   ∏ p (u j | M )  ∏ P (v j | M )  p (u , v | M ) =   j =1   j =1   u j | M ~ N ( µ j , σ j ) v j | M ~ Multinomia l[q j1 , q j 2 ,..., q jN j ] 2 How many parameters? Copyright © 2001, Andrew W. Moore Gaussian Bayes Classifiers: Slide 55 Naïve/Gauss combo for Density Estimation Categorical Real q  m − q  p (u , v | M ) =  ∏ p (u j | M )  ∏ P (v j | M )   j =1  j =1     u j | M ~ N ( µ j , σ j ) v j | M ~ Multinomia l[q j1 , q j 2 ,..., q jN j ] 2 1 ∑ ukj µj = Rk 1 σ 2 = ∑ (u kj − µ j ) 2 j Rk # of records in which v j = h q jh = R Copyright © 2001, Andrew W. Moore Gaussian Bayes Classifiers: Slide 56 28
  29. 29. Naïve/Gauss DE Example Copyright © 2001, Andrew W. Moore Gaussian Bayes Classifiers: Slide 57 Naïve/Gauss DE Example Copyright © 2001, Andrew W. Moore Gaussian Bayes Classifiers: Slide 58 29
  30. 30. Naïve / p (u , v | Y = i ) P (Y = i ) P (Y = i | u , v ) = Gauss BC p (u , v ) m− q q 1 ∏ p (u j | µ ij , σ ij ) ∏ P (v j | q ij ) P(Y = i ) = 2 p (u , v ) j =1 j =1 m− q q 1 ∏ N (u j ; µ ij ,σ ij ) ∏ qij [ v j ] pi = 2 p (u, v ) j =1 j =1 µ ij = Mean of uj among records in which y=i σ2ij = Var. of uj among records in which y=i qij[h] = Fraction of “y=i” records in which vj = h pi = Fraction of records that match “y=i” Copyright © 2001, Andrew W. Moore Gaussian Bayes Classifiers: Slide 59 Gauss / Naïve BC Example Copyright © 2001, Andrew W. Moore Gaussian Bayes Classifiers: Slide 60 30
  31. 31. Gauss / Naïve BC Example Copyright © 2001, Andrew W. Moore Gaussian Bayes Classifiers: Slide 61 Learn Wealth from 15 attributes Copyright © 2001, Andrew W. Moore Gaussian Bayes Classifiers: Slide 62 31
  32. 32. Learn Wealth from 15 attributes real values discretized Same data, except all to 3 levels Copyright © 2001, Andrew W. Moore Gaussian Bayes Classifiers: Slide 63 Learn Race from 15 attributes Copyright © 2001, Andrew W. Moore Gaussian Bayes Classifiers: Slide 64 32
  33. 33. What you should know • A lot of this should have just been a corollary of what you already knew • Turning Gaussian DEs into Gaussian BCs • Mixing Categorical and Real-Valued Copyright © 2001, Andrew W. Moore Gaussian Bayes Classifiers: Slide 65 Questions to Ponder • Suppose you wanted to create an example dataset where a BC involving Gaussians crushed decision trees like a bug. What would you do? • Could you combine Decision Trees and Bayes Classifiers? How? (maybe there is more than one possible way) Copyright © 2001, Andrew W. Moore Gaussian Bayes Classifiers: Slide 66 33

×