Introduction to Machine Learning

618 views

Published on

Published in: Technology, Education
0 Comments
1 Like
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total views
618
On SlideShare
0
From Embeds
0
Number of Embeds
5
Actions
Shares
0
Downloads
18
Comments
0
Likes
1
Embeds 0
No embeds

No notes for slide

Introduction to Machine Learning

  1. 1. Tuesday, June 26, 12
  2. 2. HiTuesday, June 26, 12
  3. 3. Tuesday, June 26, 12Use MongoDB to store millions of real estate properties for saleLots of data = fun with machine learning!
  4. 4. Tuesday, June 26, 12https://www.coursera.org/
  5. 5. Arthur Samuel (1959): Machine Learning is a field of study that gives computers the ability to learn without being explicitly programmed.Tuesday, June 26, 12Samuel was one of the pioneers of the machine learning field. He wrote a checkers programthat learned from the games it played. The program became better at checkers than he was,but hewas pretty bad at checkers. Checkers is now solved.
  6. 6. Tom Mitchell (1998): A computer program is said to learn from experience E with respect to some task T and some performance measure P if its performance on T, as measured by P, improves with experience ETuesday, June 26, 12Here’s a more formal definition.
  7. 7. Andrew NgTuesday, June 26, 12The guy who taught my class is in the middle. I pretty much ripped off this entire talk fromhis class.Here’s a pretty cool application of machine learning.Stanford taught these helicopters how to fly autonomously.apprenticeship learning
  8. 8. Tuesday, June 26, 12Old attempts at autonomous helicoptersbuild a model of world and helicopter statecreate complex algorithm by hand that tells helicopter what to do
  9. 9. Tuesday, June 26, 12
  10. 10. Tuesday, June 26, 12
  11. 11. Tuesday, June 26, 12
  12. 12. Tuesday, June 26, 12
  13. 13. Tuesday, June 26, 12
  14. 14. Tuesday, June 26, 12
  15. 15. Supervised Machine Learning Input Data X Output Data Y (already have - training set) ? (already have - training set)Tuesday, June 26, 12? is unknown function that maps our existing input to existing outputs
  16. 16. Supervised Machine Learning Goal New input data h(x) New predictionTuesday, June 26, 12create function h(x)
  17. 17. Tuesday, June 26, 12Here is a new training set
  18. 18. Tuesday, June 26, 12h is our hypothesis function
  19. 19. Tuesday, June 26, 12Given data like this, how can we predict new prices for square feet given that we haven’t seenbefore?
  20. 20. Tuesday, June 26, 12
  21. 21. Linear RegressionTuesday, June 26, 12When our target variable y is continuous, like in this example, the learning problem is calleda regression problem.When y can take on only a small number of discrete values, it is called a classificationproblem. More on that latershow ipython linear regression demosource virtualenvwrapper.sh && workon ml_talk && cd ~/work/ml_talk/demos && ipython notebook --pylab inline
  22. 22. How did we do? Input Data X (already have - test set) ? Output Data Y (already have - test set)Tuesday, June 26, 12? is unknown function that maps our existing input to existing outputs
  23. 23. Test Set 10% Training Set 90%Tuesday, June 26, 12
  24. 24. How can we give better predictions? • Add More Data • Tweak parameters • Try a different algorithmTuesday, June 26, 12these are 3 ways but there are many moreshow second demosource virtualenvwrapper.sh && workon ml_talk && cd ~/work/ml_talk/demos && ipython notebook --pylab inline
  25. 25. Test Set 10% Cross-Validation Set 20% Training Set 70%Tuesday, June 26, 12This means we get less data in our training setWhy do we need it?When tweaking model parameters, if we use the test set to gauge our success, we overfit tothe test set. Our algorithm looks better than it actually is since we use the same set we useto fine-tune parameters that we use to judge the effectiveness of our entire model.
  26. 26. numpy.polyfit, how does it work?Tuesday, June 26, 12Let’s lift the covers on this magic function.
  27. 27. Tuesday, June 26, 12Let’s make our own fitting function.
  28. 28. Hypothesis FunctionTuesday, June 26, 12h is our hypothesisYou might recognize this as Y=mx+b, the slope-intercept formulax is our input (example 1000 square feet)Theta is the parameter (actually a matrix of parameters) we are trying to determine. How?
  29. 29. Cost FunctionTuesday, June 26, 12We are going to try to minimize this thingSigma just means “add this stuff up” m is the number of training examples we haveJ of Theta is a function that, when given Theta, tells us how close to the training results (y)our prediction function (h) gets when given data from our training set.The 1/2 helps us take the derivative of this, don’t worry about it.This is known as the Least Squares cost function. Notice that the farther away our predictiongets from reality, worse the penalty grows.
  30. 30. Gradient DescentTuesday, June 26, 12This is how we will actually go about minimizing the cost function.Gradient descent seems funky but is simple easy to visualizeIs used quite a bit in the industry today
  31. 31. Tuesday, June 26, 12
  32. 32. Tuesday, June 26, 12
  33. 33. Tuesday, June 26, 12
  34. 34. Tuesday, June 26, 12
  35. 35. Tuesday, June 26, 12
  36. 36. Tuesday, June 26, 12
  37. 37. But... Local Optima!Tuesday, June 26, 12
  38. 38. Tuesday, June 26, 12
  39. 39. Tuesday, June 26, 12
  40. 40. Tuesday, June 26, 12
  41. 41. Tuesday, June 26, 12
  42. 42. Tuesday, June 26, 12
  43. 43. Tuesday, June 26, 12This is the update step that makes this happenalpha is the learning rate. This tells it how big of a step to take. Has to be not too big, or itwon’t converge, not too small, or it will take foreverWe keep updating our values for theta by taking the derivative of our cost function. This tellsus which way to go to make our cost function smaller.
  44. 44. Tuesday, June 26, 12Plug in our cost function a bunch of scary math happens (this is where the 1/2 came in handy in our cost function)
  45. 45. Update StepTuesday, June 26, 12repeat until convergence
  46. 46. Tuesday, June 26, 12too bad that’s not actually how numpy.polyfit workslet’s take a break
  47. 47. Tuesday, June 26, 12
  48. 48. Tuesday, June 26, 12
  49. 49. Training Set Lots of 1-dimensional pictures of things (X) ? 3D laser scan of same things (Y)Tuesday, June 26, 12
  50. 50. Tuesday, June 26, 12
  51. 51. Tuesday, June 26, 12
  52. 52. Tuesday, June 26, 12
  53. 53. Tuesday, June 26, 12
  54. 54. Tuesday, June 26, 12
  55. 55. Tuesday, June 26, 12
  56. 56. Tuesday, June 26, 12
  57. 57. Normal EquationTuesday, June 26, 12
  58. 58. Tuesday, June 26, 12high school way of minimizing a function (aka Fermat’s Theorem)all maxima and minima must exist at critical pointset derivative of function = 0
  59. 59. Tuesday, June 26, 12Not going to show the derivationClosed form
  60. 60. Tuesday, June 26, 12
  61. 61. Logistic RegressionTuesday, June 26, 12Logistic regression = discrete valuesspam classifierBORING let’s do neural nets
  62. 62. Neural NetworksTuesday, June 26, 12
  63. 63. Tuesday, June 26, 12
  64. 64. Tuesday, June 26, 12
  65. 65. Tuesday, June 26, 12
  66. 66. Tuesday, June 26, 12
  67. 67. Tuesday, June 26, 12
  68. 68. Tuesday, June 26, 12
  69. 69. Tuesday, June 26, 12
  70. 70. Unsupervised Machine LearningTuesday, June 26, 12Supervised learning - you have a training set with known inputs and known outputsUnsupervised learning - you just have a bunch of data and want to find structure in it
  71. 71. ClusteringTuesday, June 26, 12
  72. 72. Tuesday, June 26, 12
  73. 73. Tuesday, June 26, 12
  74. 74. Tuesday, June 26, 12
  75. 75. Tuesday, June 26, 12
  76. 76. Tuesday, June 26, 12
  77. 77. K-Means ClusteringTuesday, June 26, 12
  78. 78. K-Means ClusteringTuesday, June 26, 12
  79. 79. Tuesday, June 26, 12
  80. 80. Tuesday, June 26, 12Collaborative filteringnetflix prize
  81. 81. CustomerID,MovieID,Rating (1-5), 0 = not rated)6,16983,010,11888,010,14584,510,15957,0131,17405,5134,6243,0188,12365,0368,16002,5424,15997,0477,12080,0491,7233,3508,15929,0527,1046,2596,15294,0Tuesday, June 26, 12 1. A user expresses his or her preferences by rating items (eg. books, movies or CDs) of the system. These ratings can be viewed as an approximate representation of the users interest in the corresponding domain. 2. The system matches this user’s ratings against other users’ and finds the people with most “similar” tastes. 3. With similar users, the system recommends items that the similar users have rated highly but not yet being rated by this user (presumably the absence of rating is often considered as the unfamiliarity of an item)
  82. 82. Other cool stuff I just thought ofTuesday, June 26, 12
  83. 83. Tuesday, June 26, 12
  84. 84. Deep LearningTuesday, June 26, 12unsupervised neural networksautoencoder
  85. 85. Tuesday, June 26, 12
  86. 86. Computer Vision FeaturesTuesday, June 26, 12
  87. 87. Autoencoder Neural NetworkTuesday, June 26, 12
  88. 88. Tuesday, June 26, 12
  89. 89. Tuesday, June 26, 12
  90. 90. Tuesday, June 26, 12
  91. 91. Tuesday, June 26, 12
  92. 92. THE ENDTuesday, June 26, 12

×