ML

Machine Learning in Practice Lecture 6 Carolyn Penstein Ros é Language Technologies Institute/ Human-Computer Interaction Institute

Plan for the Day ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]

Project Proposal ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]

Assignment 3 ,[object Object],[object Object]

Use the visualize tab to view 3-way interactions

Bayes Theorem ,[object Object]

Bayes Theorem ,[object Object],[object Object]

How do we train a model? We need to compute what evidence each value of every feature gives of each possible prediction (or how typical it would be for instances of that class) What is P(Outlook = rainy | Class = yes)? Store counts on (class value, feature value) pairs How many times is Outlook = rainy when class = yes? Likelihood that play = yes if Outlook = rainy = Count(yes & rainy)/ Count(yes) * Count(yes)/Count(yes or no)

How do we train a model? Now try to compute likelihood play = yes for Outlook = sunny, Temperature = cool, Humidity = high, Windy = TRUE

Scaling ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]

Unknown Values ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],Note that unknown values are different from unobserved combinations!!!

How do we train a model? What is the conditional probability P(Humidity = Low| Play = yes)?

Another Example Model ,[object Object],[object Object],[object Object],[object Object],@attribute ice-cream {chocolate, vanilla, coffee, rocky-road, strawberry} @attribute cake {chocolate, vanilla} @attribute yummy {yum,good,ok} @data chocolate,chocolate,yum vanilla,chocolate,good coffee,chocolate,yum coffee,vanilla,ok rocky-road,chocolate,yum strawberry,vanilla,yum @relation is-yummy

Another Example Model ,[object Object],[object Object],[object Object],@attribute ice-cream {chocolate, vanilla, coffee, rocky-road, strawberry} @attribute cake {chocolate, vanilla} @attribute yummy {yum,good,ok} @data chocolate,chocolate,yum vanilla,chocolate,good coffee,chocolate,yum coffee,vanilla,ok rocky-road,chocolate,yum strawberry,vanilla,yum @relation is-yummy What is the likelihood that the answer is yum? P(strawberry|yum) = .25 P(chocolate cake|yum) = .75 .25 * .75 * .66 = .124 What is the likelihood that The answer is good? P(strawberry|good) = 0 P(chocolate cake|good) = 1 0* 1 * .17 = 0 What is the likelihood that The answer is ok? P(strawberry|ok) = 0 P(chocolate cake|ok) = 0 0*0 * .17 = 0

Another Example Model ,[object Object],[object Object],@attribute ice-cream {chocolate, vanilla, coffee, rocky-road, strawberry} @attribute cake {chocolate, vanilla} @attribute yummy {yum,good,ok} @data chocolate,chocolate,yum vanilla,chocolate,good coffee,chocolate,yum coffee,vanilla,ok rocky-road,chocolate,yum strawberry,vanilla,yum @relation is-yummy What is the likelihood that the answer is yum? P(vanilla|yum) = 0 P(vanilla cake|yum) = .25 0*.25 * .66= 0 What is the likelihood that The answer is good? P(vanilla|good) = 1 P(vanilla cake|good) = 0 1*0 * .17= 0 What is the likelihood that The answer is ok? P(vanilla|ok) = 0 P(vanilla cake|ok) = 1 0* 1 * .17 = 0

Statistical Modeling with Small Datasets ,[object Object],[object Object],[object Object],[object Object]

Smoothing ,[object Object],[object Object],[object Object]

Naïve Bayes with smoothing @attribute ice-cream {chocolate, vanilla, coffee, rocky-road, strawberry} @attribute cake {chocolate, vanilla} @attribute yummy {yum,good,ok} @data chocolate,chocolate,yum vanilla,chocolate,good coffee,chocolate,yum coffee,vanilla,ok rocky-road,chocolate,yum strawberry,vanilla,yum @relation is-yummy What is the likelihood that the answer is yum? P(vanilla|yum) = .11 P(vanilla cake|yum) = .33 .11*.33* .66= .03 What is the likelihood that The answer is good? P(vanilla|good) = .33 P(vanilla cake|good) = .33 .33 * .33 * .17 = .02 What is the likelihood that The answer is ok? P(vanilla|ok) = .17 P(vanilla cake|ok) = .66 .17 * .66 * .17 = .02

Numeric Values ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]

Example ,[object Object],[object Object],[object Object],[object Object],[object Object]

Multinomial Naïve Bayes Multiply this by the prior probability of H to get the likelihood, just like with Naïve Bayes.

Scenario Math story problem Math story problem Math story problem Math story problem Math story problem Math story problem Math story problem Math story problem Math story problem Math story problem Math story problem Math story problem Math story problem Math story problem Math Skill 1 Math Skill 2 Math Skill 3 Math Skill 4 Math Skill 5 Math Skill 6 Math Skill 7 Math Skill 8 Math Skill 9 Math Skill10 Math Skill11 Math Skill12 Math Skill13 Math Skill14

Scenario Each problem may be associated with more than one skill Math story problem Math story problem Math story problem Math story problem Math story problem Math story problem Math story problem Math story problem Math story problem Math story problem Math story problem Math story problem Math story problem Math story problem Math Skill 1 Math Skill 2 Math Skill 3 Math Skill 4 Math Skill 5 Math Skill 6 Math Skill 7 Math Skill 8 Math Skill 9 Math Skill10 Math Skill11 Math Skill12 Math Skill13 Math Skill14

Scenario Each skill may be associated with more than one problem Math story problem Math story problem Math story problem Math story problem Math story problem Math story problem Math story problem Math story problem Math story problem Math story problem Math story problem Math story problem Math story problem Math story problem Math Skill 1 Math Skill 2 Math Skill 3 Math Skill 4 Math Skill 5 Math Skill 6 Math Skill 7 Math Skill 8 Math Skill 9 Math Skill10 Math Skill11 Math Skill12 Math Skill13 Math Skill14

How to address the problem? ,[object Object]

How to address the problem? ,[object Object],[object Object],[object Object]

How to address the problem? ,[object Object],[object Object],[object Object],[object Object]

How to address the problem? ,[object Object],[object Object],[object Object],[object Object],[object Object]

Low resolution gives more information if the accuracy is higher Remember this discussion from lecture 2?

Which of these approaches is better? ,[object Object],[object Object],[object Object]

Approach 1 Each skill corresponds to a separate binary predictor. Each of 91 binary predictors is applied to each text 91 separate predictions are made for each text. Math story problem Math story problem Math story problem Math story problem Math story problem Math story problem Math story problem Math story problem Math story problem Math story problem Math story problem Math story problem Math story problem Math story problem Math Skill 1 Math Skill 2 Math Skill 3 Math Skill 4 Math Skill 5 Math Skill 6 Math Skill 7 Math Skill 8 Math Skill 9 Math Skill10 Math Skill11 Math Skill12 Math Skill13 Math Skill14

Approach 2 Each skill corresponds to a separate Class value. A single multi- class predictor is applied to each text Only 1 prediction is made for each text. Math story problem Math story problem Math story problem Math story problem Math story problem Math story problem Math story problem Math story problem Math story problem Math story problem Math story problem Math story problem Math story problem Math story problem Math Skill 1 Math Skill 2 Math Skill 3 Math Skill 4 Math Skill 5 Math Skill 6 Math Skill 7 Math Skill 8 Math Skill 9 Math Skill10 Math Skill11 Math Skill12 Math Skill13 Math Skill14

Which of these approaches is better? ,[object Object],[object Object],[object Object],More power, but more opportunity for error

Which of these approaches is better? ,[object Object],[object Object],[object Object],Less power, but fewer opportunities for error

Approach 1: One versus all ,[object Object],[object Object],[object Object],[object Object],[object Object]

Counts Without Smoothing ,[object Object],[object Object],[object Object],[object Object],Skill5 Majority Class WordX WordY 3 4

Counts With Smoothing ,[object Object],[object Object],[object Object],[object Object],Skill5 Majority Class WordX WordY 4 5

Approach 1 ,[object Object],[object Object],[object Object],[object Object],[object Object]

Approach 1 ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]

Counts Without Smoothing ,[object Object],[object Object],[object Object],Skill5 Majority Class WordX WordY 3 4 1 13

Counts With Smoothing ,[object Object],[object Object],[object Object],Skill5 Majority Class WordX WordY 4 5 2 14

Remember this: What do concepts look like?

Review: Concepts as Lines R B S T C X X X X X X

Review: Concepts as Lines X What will be the prediction for this new data point? R B S T C X X X X X X

What are we learning? ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]

Taking a Step Back ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]

Performing well with skewed class distributions ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]

Skewed but no clean separation

Taking a Step Back ,[object Object],[object Object],[object Object],[object Object],[object Object]

ML

Recommended

Recommended

More Related Content

Viewers also liked

Viewers also liked (12)

Similar to ML

Similar to ML (20)

Recently uploaded

Recently uploaded (20)

ML