Machine Learning

1,369 views

Published on

Overview of common classifiers in machine learning.

Published in: Technology, Education
1 Comment
1 Like
Statistics
Notes
No Downloads
Views
Total views
1,369
On SlideShare
0
From Embeds
0
Number of Embeds
109
Actions
Shares
0
Downloads
34
Comments
1
Likes
1
Embeds 0
No embeds

No notes for slide

Machine Learning

  1. 1. Machine Learningdwilliams@truecar.comWinter 2012
  2. 2. Machine LearningClassification: Predicting discrete valuesRegression: Predicting continuous valuesClustering: Detecting similar groupsOptimization: Finding input that maximizes output
  3. 3. Machine LearningClassification and RegressionImagine an omniscient oracle who answers question:oracle( question ) = answerGoal: From previous questions and answers, create a function thatapproximates the oraclef( question ) -> oracle( question ) as examples -> ∞
  4. 4. ClassificationClassification: Predicting discrete valuesExampleGiven:Shape Color Width(in) Weight(g) Calories Taste TypeRound Red 4.2 205 73 Sweet AppleRound Green 3.7 145 52 Sour AppleRound Orange 3.2 131 62 Sweet OrangeRound Orange 5.7 181 75 Bitter GrapefruitCylinder Yellow 1.5 140 123 Sweet BananaOval Yellow 2.2 58 17 Sour LemonRound Purple 0.7 2.4 2 Sweet GrapeRound Green 2.0 65 45 Tart KiwiRound Green 8.0 4518 1366 Sweet WatermelonPredict Type:Shape Color Width Weight Calories Taste TypeRound Red 5.2 193 78 Bitter ?
  5. 5. ClassificationClassification: Predicting discrete valuesDecision TreesInputs: Discrete and ContinuousLabels: nRule: Which leaf of a tree do I belong of a binary search tree?Support Vector MachinesInputs: ContinuousLabels: 2Rule: Which side of a hyperplane am I on?Nearest NeighborsInputs: ContinuousLabels: nRule: Who am I closest to?Naïve BayesInputs: Discrete and ContinuousLabels: nRule: What am I most likely to be?Neural NetworksInputs: ContinuousLabels: nRule: Which node do I map to after moving through a weighted network?
  6. 6. ClassificationClassification: Predicting discrete valuesDecision TreesInputs: Discrete and ContinuousLabels: nRule: Which leaf of a tree do I belong of a binary search tree?Nearest NeighborsInputs: ContinuousLabels: nRule: Who am I closest to?Naïve BayesInputs: Discrete and ContinuousLabels: nRule: What am I most likely to be?Neural NetworksInputs: ContinuousLabels: nRule: Which node do I map to after moving through a weighted network?
  7. 7. ClassificationClassification: Predicting discrete valuesSupport Vector MachinesInputs: ContinuousLabels: 2Rule: Which side of a hyperplane am I on?
  8. 8. ClassificationClassification: Predicting discrete valuesSupport Vector MachinesInputs: ContinuousLabels: 2Rule: Which side of a hyperplane am I on?
  9. 9. ClassificationClassification: Predicting discrete valuesSupport Vector MachinesInputs: ContinuousLabels: 2Rule: Which side of a hyperplane am I on?Kernel Trick: You can compute distances in higher dimensions, even infinite, without actually movingthere. (Mercer’s condition)
  10. 10. ClassificationClassification: Predicting discrete valuesSupport Vector MachinesInputs: ContinuousLabels: 2Rule: Which side of a hyperplane am I on?Common Kernels:
  11. 11. ClassificationClassification: Predicting discrete valuesSupport Vector MachinesInputs: ContinuousLabels: 2Rule: Which side of a hyperplane am I on?Kernel: Homogenous Polynomial k(x1∙x2)=(x1∙x2)**2
  12. 12. ClassificationClassification: Predicting discrete valuesNaïve BayesInputs: Discrete and ContinuousLabels: nRule: What am I most likely to be?Naïve Assumption:
  13. 13. ClassificationClassification: Predicting discrete valuesHow the classifiers see the same data:Decision Trees Support Vector Machines Naïve Bayes
  14. 14. Classifier EnsemblesAn ensemble is a collection of classifiers, each trained on a different subset of thetraining data. At prediction time, the classifiers vote on the correct label. Theresult is a probability associated with each label based on proportion of classifiersthat voted for them, and one often takes the label with highest probability.Voting strategies:BaggingMultiple classifiers vote of the correct label, all votes counted equallyBoostingMultiple classifiers vote but the votes are weighted by their error rate on areserved test set
  15. 15. Random ForestsA Random Forest is an ensemble of decision trees. Random Forests have beenproven to extract all information possible from a dataset and, in fact, converge tothe oracle.Gerard BiauEcole Normale SuperieureAnalysis of a Random Forests ModelJournal of Machine Learning Research (2012)“Despite growing interest and practical use, there has been little exploration of the statistical propertiesof random forests, and little is known about the mathematical forces driving the algorithm.In this paper, we *…+ show that the procedure is consistent and adapts to sparsity, in the sense that [the]rate of convergence depends only on the number of strong features and not on how many noisevariables are present. “
  16. 16. Naïve BayesNaïve Bayes classifiers appear to give good results in practice even though they arebased on a potentially unrealistic assumption. Recently, researches have beenattempting to explain the conditions that lead to successful Naïve Bayes classifiers.Harry ZhangUniversity of New BrunswickThe Optimality of Naïve BayesAmerican Association for Artificial Intelligence (2004)“In a given dataset, two attributes may depend on each other, but the dependence may distributeevenly in each class. Clearly, in this case, the conditional independence assumption is violated, but NaïveBayes is still the optimal classifier.Furthermore *…+ if we look at two attributes, there may exist strong dependence between them thataffects the classification. When the dependencies among all attributes work together, however, theymay cancel each other out and no longer affect the classification.“
  17. 17. Choosing a ClassifierRandom ForestsProsHighly convergentIgnores noiseConsWhole forest must be retrained (batch)Time to classify depends on number of treesNaïve BayesProsFast to trainFast to classifyIncremental updates (stream)Evaluates in O(1) timeSeem to work in practiceConsDoes not capture feature covarianceContinuous inputs need distribution estimation
  18. 18. LinksNaïve Bayes in 50 lines of Pythonhttp://ebiquity.umbc.edu/blogger/2010/12/07/naive-bayes-classifier-in-50-lines/NIST Special Database 19 - Handwriting Sampleshttp://gorillamatrix.com/files/nist-sd19.rarAnalysis of a Random Forests Modelhttp://jmlr.csail.mit.edu/papers/volume13/biau12a/biau12a.pdfThe Optimality of Naïve Bayeshttp://courses.ischool.berkeley.edu/i290-dm/s11/SECURE/Optimality_of_Naive_Bayes.pdfApache Mahouthttp://mahout.apache.org/Programming Collective Intelligencehttp://shop.oreilly.com/product/9780596529321.do
  19. 19. Thank you
  20. 20. “Vision without action is a daydream.Action without vision is a nightmare.”- Japanese Proverb

×