This document summarizes key points from a machine learning lecture, including announcing assignments, discussing Naive Bayes classifiers and how they are trained, handling numeric and text data with Naive Bayes, and addressing many-to-many relationships between data points and labels. It recommends using multiple binary classifiers rather than a single multi-class classifier to assign skills to math problems, for better accuracy with fewer opportunities for error.
14. How do we train a model? We need to compute what evidence each value of every feature gives of each possible prediction (or how typical it would be for instances of that class) What is P(Outlook = rainy | Class = yes)? Store counts on (class value, feature value) pairs How many times is Outlook = rainy when class = yes? Likelihood that play = yes if Outlook = rainy = Count(yes & rainy)/ Count(yes) * Count(yes)/Count(yes or no)
15. How do we train a model? Now try to compute likelihood play = yes for Outlook = sunny, Temperature = cool, Humidity = high, Windy = TRUE
16.
17.
18. How do we train a model? What is the conditional probability P(Humidity = Low| Play = yes)?
19.
20.
21.
22.
23.
24.
25.
26. Naïve Bayes with smoothing @attribute ice-cream {chocolate, vanilla, coffee, rocky-road, strawberry} @attribute cake {chocolate, vanilla} @attribute yummy {yum,good,ok} @data chocolate,chocolate,yum vanilla,chocolate,good coffee,chocolate,yum coffee,vanilla,ok rocky-road,chocolate,yum strawberry,vanilla,yum @relation is-yummy What is the likelihood that the answer is yum? P(vanilla|yum) = .11 P(vanilla cake|yum) = .33 .11*.33* .66= .03 What is the likelihood that The answer is good? P(vanilla|good) = .33 P(vanilla cake|good) = .33 .33 * .33 * .17 = .02 What is the likelihood that The answer is ok? P(vanilla|ok) = .17 P(vanilla cake|ok) = .66 .17 * .66 * .17 = .02
27. Naïve Bayes with smoothing @attribute ice-cream {chocolate, vanilla, coffee, rocky-road, strawberry} @attribute cake {chocolate, vanilla} @attribute yummy {yum,good,ok} @data chocolate,chocolate,yum vanilla,chocolate,good coffee,chocolate,yum coffee,vanilla,ok rocky-road,chocolate,yum strawberry,vanilla,yum @relation is-yummy What is the likelihood that the answer is yum? P(vanilla|yum) = .11 P(vanilla cake|yum) = .33 .11*.33* .66= .03 What is the likelihood that The answer is good? P(vanilla|good) = .33 P(vanilla cake|good) = .33 .33 * .33 * .17 = .02 What is the likelihood that The answer is ok? P(vanilla|ok) = .17 P(vanilla cake|ok) = .66 .17 * .66 * .17 = .02
33. Scenario Math story problem Math story problem Math story problem Math story problem Math story problem Math story problem Math story problem Math story problem Math story problem Math story problem Math story problem Math story problem Math story problem Math story problem Math Skill 1 Math Skill 2 Math Skill 3 Math Skill 4 Math Skill 5 Math Skill 6 Math Skill 7 Math Skill 8 Math Skill 9 Math Skill10 Math Skill11 Math Skill12 Math Skill13 Math Skill14
34. Scenario Each problem may be associated with more than one skill Math story problem Math story problem Math story problem Math story problem Math story problem Math story problem Math story problem Math story problem Math story problem Math story problem Math story problem Math story problem Math story problem Math story problem Math Skill 1 Math Skill 2 Math Skill 3 Math Skill 4 Math Skill 5 Math Skill 6 Math Skill 7 Math Skill 8 Math Skill 9 Math Skill10 Math Skill11 Math Skill12 Math Skill13 Math Skill14
35. Scenario Each skill may be associated with more than one problem Math story problem Math story problem Math story problem Math story problem Math story problem Math story problem Math story problem Math story problem Math story problem Math story problem Math story problem Math story problem Math story problem Math story problem Math Skill 1 Math Skill 2 Math Skill 3 Math Skill 4 Math Skill 5 Math Skill 6 Math Skill 7 Math Skill 8 Math Skill 9 Math Skill10 Math Skill11 Math Skill12 Math Skill13 Math Skill14
36.
37.
38.
39.
40.
41. Low resolution gives more information if the accuracy is higher Remember this discussion from lecture 2?
42.
43. Approach 1 Each skill corresponds to a separate binary predictor. Each of 91 binary predictors is applied to each text 91 separate predictions are made for each text. Math story problem Math story problem Math story problem Math story problem Math story problem Math story problem Math story problem Math story problem Math story problem Math story problem Math story problem Math story problem Math story problem Math story problem Math Skill 1 Math Skill 2 Math Skill 3 Math Skill 4 Math Skill 5 Math Skill 6 Math Skill 7 Math Skill 8 Math Skill 9 Math Skill10 Math Skill11 Math Skill12 Math Skill13 Math Skill14
44. Approach 2 Each skill corresponds to a separate Class value. A single multi- class predictor is applied to each text Only 1 prediction is made for each text. Math story problem Math story problem Math story problem Math story problem Math story problem Math story problem Math story problem Math story problem Math story problem Math story problem Math story problem Math story problem Math story problem Math story problem Math Skill 1 Math Skill 2 Math Skill 3 Math Skill 4 Math Skill 5 Math Skill 6 Math Skill 7 Math Skill 8 Math Skill 9 Math Skill10 Math Skill11 Math Skill12 Math Skill13 Math Skill14