Machine Learning

1,869 views

Published on

Simple presentation explaining what machine learning is.

Published in: Technology, Education

Machine Learning

  1. 1. Machine LearningLW/OB presentation
  2. 2. Machine learning ( ML ) is a field concerned withstudying and developing algorithms that perform better at a task as they gain experience ( but mostly I wanted to use this cool picture )
  3. 3. WARNING This presentation is seriously lacking slides, preparation and cool running examples.That being said. I know what I’m talking about ;)
  4. 4. What ML is reallyabout…
  5. 5. What ML is reallyabout…• ML is about data, andmodeling its distribution
  6. 6. What ML is reallyabout…• ML is about data, andmodeling its distribution• ML is about a tradeoffbetween model accuracyand predictive power
  7. 7. What ML is reallyabout…• ML is about data, andmodeling its distribution• ML is about a tradeoffbetween model accuracyand predictive power• ML is about findingsimple yet expressiveclasses of distributions
  8. 8. What ML is reallyabout…• ML is about data, andmodeling its distribution• ML is about a tradeoffbetween model accuracyand predictive power• ML is about findingsimple yet expressiveclasses of distributions• ML is about usingapproximate numericalmethods to performBayesian update on thetraining data
  9. 9. ML = intersection of
  10. 10. Data sizes vary…
  11. 11. Data sizes vary…From a couple kilobytes
  12. 12. Data sizes varyFrom a couple kilobytes To petabytes
  13. 13. Type of problems solved• Supervised• Unsupervised• Reinforcement learning• ( transduction )
  14. 14. Type of problems solved• Supervised – Classification – Regression• Unsupervised• Reinforcement learning• ( transduction )
  15. 15. Type of problems solved• Supervised• Unsupervised - Clustering - Discovering causal links• Reinforcement learning• ( transduction )
  16. 16. Type of problems solved• Supervised• Unsupervised• Reinforcement learning – Learn to perform a task, only from final result• ( transduction ) – Not discussed, improve supervised learning with unsupervised samples
  17. 17. Typical applications• Image, speech, pattern recognition• Collaborative filtering• Time series forecasting• Game playing• Denoising• Any task where experience is valuable
  18. 18. Common ML techniques
  19. 19. Common ML techniques• Linearregression
  20. 20. Common ML techniques• Linearregression• Factor models
  21. 21. Common ML techniques• Linearregression• Factor models• Decision trees
  22. 22. Common ML techniques• Linearregression• Factor models• Decision trees• Neural networks
  23. 23. Common ML techniques• Linearregression• Factor models• Decision trees• Neural networksperceptron, multilayer perceptron with backpropagation,hebbian autoassociative memory, Boltzmann machine,spiking neurons…
  24. 24. Common ML techniques• Linearregression• Factor models• Decision trees• Neural networks• SVM’s
  25. 25. Common ML techniques• Linearregression• Factor models• Decision trees• Neural networks• SVM’s• Bayesiannetworks, whitebox models…
  26. 26. Meta-Methods
  27. 27. Meta-Methods– Ensemble forecasting
  28. 28. Meta-Methods– Ensemble forecasting– Bootstrapping, Bagging, model averaging
  29. 29. Meta-Methods– Ensemble forecasting– Bootstrapping, Bagging, model averaging– Boosting
  30. 30. Meta-Methods– Ensemble forecasting– Bootstrapping, Bagging, model averaging– Boosting– Inductive bias through
  31. 31. Meta-Methods– Ensemble forecasting– Bootstrapping, Bagging, model averaging– Boosting– Inductive bias through • Out of sample testing
  32. 32. Meta-Methods– Ensemble forecasting– Bootstrapping, Bagging, model averaging– Boosting– Inductive bias through • Out of sample testing • Minimum description length
  33. 33. Neural networks demystified
  34. 34. Neural networksdemystified• Perceptron ( 1957 )
  35. 35. Neural networksdemystified• Perceptron ( 1957 ) THIS IS…
  36. 36. Neural networksdemystified• Perceptron ( 1957 ) THIS IS… LINEAR ALGEBRA!
  37. 37. Neural networksdemystified• Perceptron• Linear separability
  38. 38. Neural networksdemystified• Perceptron• Linear separability8 binary inputs =>1/2212classifications linearlyseparable
  39. 39. Neural networksdemystified• Perceptron ( 1957 )• Linear separability• Multilayeredperceptron +backpropagation
  40. 40. Neural networksdemystified• Perceptron ( 1957 )• Linear separability• Multilayeredperceptron +backpropagation( 1969 ~ 1986 )
  41. 41. Neural networksdemystified• Perceptron ( 1957 )• Linear separability• Multilayeredperceptron +backpropagation• Smooth Interpolation
  42. 42. Many more types…
  43. 43. SVM in a nutshell
  44. 44. SVM in a nutshell• Maximize margin
  45. 45. SVM in a nutshell• Maximize margin• Embed in a high dimensional space
  46. 46. Ensemble learning• Combine predictions through voting ( with classifiers ) or regression to improve prediction
  47. 47. Ensemble learning• Combine predictions through voting ( with classifiers ) or regression to improve prediction• Train on random ( with replacement ) subsets of the data ( bootstrapping )
  48. 48. Ensemble learning• Combine predictions through voting ( with classifiers ) or regression to improve prediction• Train on random ( with replacement ) subsets of the data ( bootstrapping )• Or weight the data according to the quality of prediction, and train new weak classifiers accordingly ( boosting )
  49. 49. Numerical tricks
  50. 50. Numerical tricks• Optimization of fit with standard operational search techniques
  51. 51. Numerical tricks• Optimization of fit with standard operational search techniques• EM algorithm
  52. 52. Numerical tricks• Optimization of fit with standard operational search techniques• EM algorithm• MCMC methods ( Gibbs sampling, metropolis algorithm… )
  53. 53. A fundamental Bayesian model, the Hidden Markov Model
  54. 54. A fundamental Bayesian model, the Hidden Markov Model• Hidden states produce observed states
  55. 55. A fundamental Bayesian model, the Hidden Markov Model• Hidden states produce observed states• Billions of application – Finance – Speech recognition – Swype – Kinect – Open heart surgery – Airplane navigation
  56. 56. Questions I was asked• How does Boosting work ?• What is the No Free Lunch Theorem ?• Writing style recognition• Signature recognition• Rule extraction• Moving odds in response to informed gamblers• BellKor-Pragmatic Chaos and the Netflix prize
  57. 57. Writing style recognition
  58. 58. Writing style recognition• Naïve Bayes ( similar to spam filtering, bag of words approach )
  59. 59. Writing style recognition• Naïve Bayes ( similar to spam filtering, bag of words approach )• Clustering of HMM model parameters
  60. 60. Writing style recognition• Naïve Bayes ( similar to spam filtering, bag of words approach )• Clustering of HMM model parameters• Simple statistics on text corpus ( sentence length distribution, word length distribution, density of punctuation )
  61. 61. Writing style recognition• Naïve Bayes ( similar to spam filtering, bag of words approach )• Clustering of HMM model parameters• Simple statistics on text corpus ( sentence length distribution, word length distribution, density of punctuation )• Combine with a logistic regression
  62. 62. Signature recognition
  63. 63. Signature recognition• Depends if raster or vector
  64. 64. Signature recognition• Depends if raster or vector• Post office uses neural networks, but corpus is gigantic
  65. 65. Signature recognition• Depends if raster or vector• Post office uses neural networks, but corpus is gigantic• Dimensionality reduction is key
  66. 66. Signature recognition• Wavelet on raster image for feature extraction
  67. 67. Signature recognition• Depends if raster or vector• Post office uses neural networks, but corpus is gigantic• Dimensionality reduction is key• Wavelet on raster image for feature extraction• Path following then learning on path features ( total variation, average curvature etc )
  68. 68. Rules extraction• Hard, hypothesis space not smooth• Decision tree regression• Genetic Programming ( Koza )
  69. 69. Netflix prize
  70. 70. Netflix prize• The base (cinematch) = latent semantic model• The defining characteristic of winners, ensemble prediction with neural networks to combine predictors• Best team were mergers of good teams
  71. 71. Latent semantic model• There is a set of K “features”. Each movie has a score in each feature, each user has a weight for each feature.• Features are latent, we only assume the value of K.• Equivalent to representing the rating matrix as a product of a score and preference matrix. SVD minimizes RMSE
  72. 72. Poker is hard…
  73. 73. Poker is hard…• Gigantic, yet not continuous state space• Dimensionality reduction isn’t easy• High variance• Possible to make parametric strategies and optimize with ML• Inputs such as pot odds trivial to compute
  74. 74. Uhuh, slides end here
  75. 75. Sort of… Questions ?

×