Naive Bayes with Conditionally Dependent Data


Published on

Examination of Naive Bayes with conditionally dependent data sets.

Published in: Technology, Education
1 Like
  • Be the first to comment

No Downloads
Total views
On SlideShare
From Embeds
Number of Embeds
Embeds 0
No embeds

No notes for slide

Naive Bayes with Conditionally Dependent Data

  1. 1. Why does Naïve Bayesian Classification work so well amidst known conditional dependencies in the data structure? <ul><li>Part1: Written critique of Zhang 2004, “The optimality of Naïve Bayes” </li></ul><ul><li>Part2: Experiments with Naïve Bayes in the presence of different forms of synthetic conditional dependency, and synthetic conditional dependency mixed with benchmark data sets, to demonstrate principles outlined in Zhang 2004. </li></ul><ul><li>Part3: Summary presentation of results of above, along with training in the use of an “R” Naïve Bayes package </li></ul>
  2. 28. Naïve Bayesian Classification A form of machine learning that avoids complicated conditional dependency models, and the requirement to define much of the conditional dependencies in your data. Why does it work so well amidst conditional dependency? Tim Hare
  3. 29. Naïve Bayes (naïvely, hence the name) assumes no conditional dependence, but this simplification comes at a potential cost of misclassification <ul><li>Joint probability = likelihood * prior </li></ul>
  4. 30. NB performance is at odds with past theory : evidence in the primary literature that Naïve Bayes works beyond what would be anticipated given known conditional dependence in the data <ul><li>Zhang 2004: “The Optimality of Naïve Bayes” </li></ul><ul><ul><li>Closed form analytical investigation (argument by proof) in support of NB being able to classify reliably despite conditional dependence IF the dependence is of the same form across all classes </li></ul></ul><ul><ul><li>Contention: NB works well if the conditional dependence is of the same type in all classes within an attribute , or is not of the same type, but misclassification “cancels out” across attributes </li></ul></ul>
  5. 31. Zhang 2004: Factoring a general form of Bayes into two parts: [NB] * [“something else”] <ul><li>That more general framework can be factored into [NB] x [“something else”] </li></ul><ul><li>[“something else”]  1 IF conditional dependence is distributed evenly in all classes and in turn NB = general Bayesian model </li></ul><ul><li>This is one way in which NB can perform like FB </li></ul>Take home message: the factorization indicates that FB=NB under certain data structures, and not in others.
  6. 32. Full Bayes (FB) and Naïve Bayes (NB) classification carried out on synthetic data by hand on one data vector = <1,0> When conditional dependence is of different types (C1: if A then A, C2: if A then B) in the two classes (upper left data grid: you may recognize this as “XOR”) NB will fail to classify correctly (and the information is “lost” due to “cancellation” by equal probabilities taking part in each classification estimate). If the conditional dependence is of the same type (C1=C2: If A then B) in both classes (lower left data grid) NB may still classify the data correctly. FB always classifies correctly in BOTH instances. Posterior probability may be biased, but in fact that nets out (though analysis too complex to present here) to correct classification as well for a variety of reasons, in many cases. Loss (ratio is just 1) but no Bias Bias but no Loss
  7. 33. Naïve Bayes in R on the synthetic conditionally dependent data we analyzed in EXCEL for vector <1,0>, results in the same misclassification for the MIXED conditional dependence, and correct Democratic classification in the case of “even” conditional dependence.
  8. 34. Real data: House of Representatives 1984 voting record on 17 congressional bills (columns) <ul><li>Two classes: C = (Democrat, Republican) = column 1 </li></ul><ul><li>Binary attribute values are our “Yes”/”No” votes on each of 17 bills </li></ul><ul><li>Each row is the voting record of one Congress-person on all 17 bills </li></ul>
  9. 35. Use “R” for NB classification on HV84 +/- augmentation with conditional dependence via synthetic data <ul><li>Control run: Use Naïve Bayes to classify the unmodified data on voting records as either having been cast by a Democrat or Republican (e.g. the class1 vs class2) </li></ul><ul><li>Experiment 1: Add “mixed” (“if A then A” to one class, “if A then B” the other class) conditional dependence synthetic data to the HV84 data set, and repeat the analysis of NB classification </li></ul><ul><li>Experiment 2: Add “consistent”, evenly distributed across classes (“if A then B” to both classes) conditional dependence synthetic data to the HV84 data set, and repeat the analysis of NB classification. </li></ul><ul><li>Our hand analysis (done above in EXCEL so far) as well as Zhang 2004, suggests we may not see a much difference in classification. </li></ul>
  10. 36. Control analysis for synthetic augmentation experiments #1 and #2 (to follow): NB analysis HV84 real data unmodified by synthetic data <ul><li>#use the install packages GUI option to search for and install package 'e1071' </li></ul><ul><li>library(e1071) </li></ul><ul><li>HV84_data <- read.table(&quot;C:/HV84.csv&quot;, header=T, sep=&quot;,&quot;) </li></ul><ul><li>HV84_data </li></ul><ul><li>HV84_model <- naiveBayes(Class ~ ., data = HV84_data) </li></ul><ul><li>#HV84_pred_raw <- predict(HV84_model, HV84_data[1:5,-1], type = &quot;raw&quot;) </li></ul><ul><li>HV84_pred_raw <- predict(HV84_model, HV84_data[,-1], type = &quot;raw&quot;) </li></ul><ul><li>HV84_pred_class <- predict(HV84_model, HV84_data[,-1]) </li></ul><ul><li>table(HV84_pred_class, HV84_data$Class) </li></ul><ul><li>write.csv(HV84_pred_raw, file = &quot;c:/HV84_pred_raw.csv&quot;) </li></ul>
  11. 37. Augmentation with synthetic data -- experiment 1: NB analysis on HV84 augmented by the conditionally dependent synthetic data, with the conditional dependence of the different types (“mixed”) in the two classes <ul><li>#use the install packages GUI option to search for and install package 'e1071' </li></ul><ul><li>library(e1071) </li></ul><ul><li>HV84_MIXEDCD_data <- read.table(&quot;C:/HV84_MIXEDCD.csv&quot;, header=T, sep=&quot;,&quot;) </li></ul><ul><li>HV84_MIXEDCD_data </li></ul><ul><li>HV84_MIXEDCD_model <- naiveBayes(Class ~ ., data = HV84_MIXEDCD_data) </li></ul><ul><li>#HV84_MIXEDCD_pred_raw <- predict(HV84_MIXEDCD_model, HV84_MIXEDCD_data[1:5,-1], type = &quot;raw&quot;) </li></ul><ul><li>HV84_MIXEDCD_pred_raw<- predict(HV84_MIXEDCD_model, HV84_MIXEDCD_data[,-1], type = &quot;raw&quot;) </li></ul><ul><li>HV84_MIXEDCD_pred_class <- predict(HV84_MIXEDCD_model, HV84_MIXEDCD_data[,-1]) </li></ul><ul><li>table(HV84_MIXEDCD_pred_class, HV84_MIXEDCD_data$Class) </li></ul><ul><li>write.csv(HV84_MIXEDCD_pred_raw, file = &quot;c:/HV84_MIXEDCD_pred_raw.csv&quot;) </li></ul>
  12. 38. Augmentation with synthetic data -- experiment 2: NB analysis on HV84 augmented by the conditionally dependent synthetic data, with the conditional dependence of the same type (“even”) in the two classes <ul><li>#use the install packages GUI option to search for and install package 'e1071' </li></ul><ul><li>library(e1071) </li></ul><ul><li>HV84_EVENCD_data <- read.table(&quot;C:/HV84_EVENCD.csv&quot;, header=T, sep=&quot;,&quot;) </li></ul><ul><li>HV84_EVENCD_data </li></ul><ul><li>HV84_EVENCD_model <- naiveBayes(Class ~ ., data = HV84_EVENCD_data) </li></ul><ul><li>#HV84_EVENCD_pred_raw <- predict(HV84_EVENCD_model, HV84_EVENCD_data[1:5,-1], type = &quot;raw&quot;) </li></ul><ul><li>HV84_EVENCD_pred_raw<- predict(HV84_EVENCD_model, HV84_EVENCD_data[,-1], type = &quot;raw&quot;) </li></ul><ul><li>HV84_EVENCD_pred_class <- predict(HV84_EVENCD_model, HV84_EVENCD_data[,-1]) </li></ul><ul><li>table(HV84_EVENCD_pred_class, HV84_EVENCD_data$Class) </li></ul><ul><li>write.csv(HV84_EVENCD_pred_raw, file = &quot;c:/HV84_EVENCD_pred_raw.csv&quot;) </li></ul>
  13. 39. Matrices of classification outcomes for control (top matrix), “mixed” (middle matrix) and “even” (bottom matrix): no adverse impact on classification Same assignment made in each experiment indicating that augmentation of real data with two types of conditional dependence does not influence classification, at least with this HV84 data set
  14. 40. Raw probabilities, however, show that even though assignments to class didn’t change in CONTROL, EXPT#1, and EXPT#2, differences (in this case slight) are imparted to the probability estimates, as expected. Important to note we only added 2 attributes (columns) to 17, so the percentage of “contamination” by synthetic data is small. Additional exploration could be done with increasing percentages of conditional dependence added in to the original HV84 data set.
  15. 41. Knowledge check: FB or NB? <ul><ul><li>You have 3 million potential pro-drug compounds to evaluate given their chemical features. </li></ul></ul><ul><ul><li>Each compound is described by a feature vector of 20 chemical attributes </li></ul></ul><ul><ul><li>3,000 (0.1%, random sample) of these have been run through the assay, so you can classify these as “active” or “inactive” </li></ul></ul><ul><ul><li>What would be the pros and cons of building a Bayesian model using FB vs NB, to predict which of the un-assayed compounds might be potentially attractive to evaluate further? </li></ul></ul><ul><ul><li>1) hard to know conditional dependencies and can tolerate some inaccuracy?  NB </li></ul></ul><ul><ul><li>2) use the 3000 we know to assess conditional dependence  FB </li></ul></ul><ul><ul><li>3) need very accurate probability estimates in classification  FB </li></ul></ul><ul><ul><li>4) We can afford false positives or false negatives  NB </li></ul></ul>
  16. 42. References <ul><li>Zhang, 2004, “The Optimality of Naïve Bayes”. In: Proc 17 th International FLAIRS Conference, Florida, USA. </li></ul><ul><li>  </li></ul><ul><li>Zhang & Ling, 2001, “Learnability of Augmented Naïve Bayes in Nominal Domains”, in Proceedings of the Eighteenth International Conference on Machine Learning, 617-623. </li></ul><ul><li>  </li></ul><ul><li>Friedman  & Fayyad, 1997, “On Bias, variance, 0/1-loss, and the Curse of Dimensionality” Data Mining and Knowledge Discovery 1, 55–77. </li></ul><ul><li>  </li></ul><ul><li>Domingos and Pazzani, 1997, “Beyond Independence: Conditions for the Optimality of the Simple Bayesian Classifier”. Machine Learning 41(1):5-15. </li></ul>
  17. 43. Q & A