Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Introduction to Machine Learning

269 views

Published on

Slides from the Big Data & Data Science Meetup Group

Published in: Data & Analytics
  • Be the first to comment

Introduction to Machine Learning

  1. 1. INTRODUCTION TO MACHINE LEARNING
  2. 2. CHILD LEARNING Child:Daddy what is danger? Dad: The possibility of suffering harm or injury. Child:Daddy what is an injury? Dad: An instance of being injured. Child:Daddy what is an instance? Dad: An example or single occurrence of something. Child:Daddy does it bother you that I’m asking so many questions? Dad: Not at all, if you don't ask you will never know.
  3. 3. CHILD LEARNING Dad: Let me give you some examples…
  4. 4. CHILD LEARNING Child:Now I understand, everything is dangerous Dad: No, there are things that aren't dangerous
  5. 5. CHILD LEARNING Child:And what are those?
  6. 6. CHILD LEARNING And there is the most natural mode of learning Action Reaction Lesson Touching hot stove aching hand Do not touch again Playing with toys Fun Continue playing Running in to the road Screaming parent Don’t run to roads Running in the house Fun Run in the house Eating chocolate Fun Search for chocolate Eating too much chocolate Stomach ache Don’t eat too much Saying bla bla No Reaction Try variations Saying daddy Overexcited parents Do that again
  7. 7. SO, HOW CHILDREN LEARN? 1. From explanation 2. From examples 3. Reinforcement Learning
  8. 8. SO, HOW CHILDREN LEARN? 1. From explanation 2. From examples 3. Reinforcement Learning
  9. 9. ABOUT US Algorithms Technology Business
  10. 10. AGENDA • What is Machine Learning • Typical Machine Learning Tasks • Supervised Learning • Unsupervised Learning • How to Get Started • Summary
  11. 11. AGENDA • What is Machine Learning • Typical Machine Learning Tasks • Supervised Learning • Unsupervised Learning • How to Get Started • Summary
  12. 12. WHAT IS MACHINE LEARNING? We say that a computer program is learning a task, if its performance on that task is improving as more experience is processed
  13. 13. WHAT IS MACHINE LEARNING? Machine Learning Statistics Databases & Big Data Decision Theory Artificial Intelligence Optimization
  14. 14. WHAT IS MACHINE LEARNING? Machine Learning Statistics Databases & Big Data Decision Theory Artificial Intelligence Optimization Data Science
  15. 15. AGENDA • What is Machine Learning • Typical Machine Learning Tasks • Supervised Learning • Unsupervised Learning • How to Get Started • Summary
  16. 16. TYPICAL MACHINE LEARNING TASKS No two Machine Learning tasks are identical. Yet, we often use the following categories: • Supervised Learning • Unsupervised Learning • Reinforcement Learning
  17. 17. SUPERVISED LEARNING Estimate or Predict an unknown result, given explicit values of some explaining features. The learning takes place as history of observations, for which both the explaining features and the results are known. Experience = supervised examples (exactly as in inferring what is dangerous from examples) We call the dataset that describe the experience training set
  18. 18. SUPERVISED LEARNING Estimate or Predict an unknown result, given explicit values of some explaining features. We call the dataset that describe the experience training set When the unknown result is numeric, we call the task Regression When the unknown result is categorical, we call the task Classification
  19. 19. SUPERVISED LEARNING Example 1: What will be the annual spent of a new customer, given a set of explaining features (e.g., demographics, first purchases, first deposit etc.)? Task qualifications: Prediction, Regression Training set: a file, in which each row represents a customer. For each such customer we will extract the explaining features, at the prediction point, as well as the annual spent (a year later).
  20. 20. SUPERVISED LEARNING Example 2: What is the activity currently performed by a user who is wearing a smart watch with inertial sensors? Task qualifications: Assessment, Classification Input: A set of sensor-based signals, along with an annotation of the activity during each signal. Requires a significant amount of pre-processing in order to produce the training set.
  21. 21. SUPERVISED LEARNING PredictionAssessment Classification Regression
  22. 22. UNSUPERVISED LEARNING Given a specific set of records, described by a given set of features, either: 1. Extract interesting patterns that appear in the data 2. Provide insightful representation of the distribution of the data Experience: the more records we have, the more significant are the patterns that we can extract, or more accurate is the representation
  23. 23. UNSUPERVISED LEARNING Example: Market Segmentation Input data: Customers’ descriptions Objective: Provide an insightful representation of the market (what types of customers are there?) Also known as cluster analysis
  24. 24. REINFORCEMENT LEARNING Learning how to best react to situation through trial and error. Simple Example: Multiple A/B testing More Typical: Robot Navigation Designing a RL system requires solving two difficult challenges: • The exploration – exploitation dilemma • Attributing delayed rewards
  25. 25. UNSTRUCTURED INPUTS The input data often come in an unstructured form, such as: • Free text • Speech • Images • Video • Sensors • Networks
  26. 26. AGENDA • What is Machine Learning • Typical Machine Learning Tasks • Supervised Learning • Unsupervised Learning • How to Get Started • Summary
  27. 27. SUPERVISED LEARNING X1 X2 X3 … Xn-2 Xn-1 Xn Y x1,1 x2,1 x3,1 … xn-2,1 xn-1,1 xn,1 y1 x1,2 x2,2 x3,2 … xn-2,2 xn-1,2 xn,2 y2 . . . . . . . . . … … … . . . . . . . . . x1,m-1 x2,m-1 x3,m-1 … xn-2,m-1 xn-1,m-1 xn,m-1 ym-1 x1,m x2,m x3,m … xn-2,m xn-1,m xn,m ym 𝑌 = 𝑓 𝑋1, 𝑋2, … , 𝑋 𝑛
  28. 28. LEARNING THE CONCEPT OF A BIRD An alien asks you: “What is a bird?” You can try and define a bird, but the alien does not understand Why don’t you give an example…
  29. 29. LEARNING THE CONCEPT OF A BIRD Is Bird?Can Fly ?ColorExample # YesYesBlack1 What do you say about the following classification model: “If Color = Black and Can_Fly = Yes then Bird Else Not_Bird”?
  30. 30. LEARNING THE CONCEPT OF A BIRD Is Bird?Can Fly ?ColorExample # YesYesBlack1 YesYesGrey2 What do you say about the following classification model: “If Can_Fly = Yes then Bird Else Not_Bird”?
  31. 31. LEARNING THE CONCEPT OF A BIRD Is Bird?Can Fly ?ColorExample # YesYesBlack1 YesYesGrey2 NoYesBlack3 Supervised Learning means generalizing from given observations.
  32. 32. GENERALIZATION VS. SPECIFICATION • A general concept is built based on the explaining features. The right set of explaining features is crucial for learning • Being over specific means memorizing and not learning • Being too general means being too coarse and missing some of the details • Finding the sweet spot between generalization and specificity is hard
  33. 33. GENERALIZATION VS. SPECIFICATION Let us find a function that estimates Y=f(X)
  34. 34. Too General / Too Simple / Under fitted Too Specific / Too Complex / Over fitted A nice solution to the trade-off
  35. 35. OVER FITTING & UNDER FITTING • We search for • We know that in addition to the functional dependency (called bias), the actual Y values are also affected by noise (called variance) • We want the model to learn the bias, but not to be affected by the variance. • A model that is too simple to learn the bias is called under fitted • A model that is overly complex that it adapts itself to the variance is called over fitted 𝑌 = 𝑓 𝑋1, 𝑋2, … , 𝑋 𝑛 The more complexity you add to the model, you can always better fit it to the training observations. This is not always a good practice!
  36. 36. A PARTIAL LIST OF SUPERVISED LEARNING METHODS • K- Nearest Neighbor • SVM (Optimal Margin Linear Separation) • Decision Trees • Naïve Bayes • Linear Regression • Logistic Regression • (Deep) Neural Networks
  37. 37. A PARTIAL LIST OF SUPERVISED LEARNING METHODS • K- Nearest Neighbor • SVM (Optimal Margin Linear Separation) • Decision Trees • Naïve Bayes • Linear Regression • Logistic Regression • (Deep) Neural Networks
  38. 38. K-NN Recipients EmailLength Given a new observation, find the K closest available observations and: • In regression, use the average result of these K observations • In Classification, use voting amongst these K observations
  39. 39. K-NN Recipients EmailLength K=3 Few concerns: • What should be k? • Which distance measure should be used? • Computation
  40. 40. LINEAR SEPERATORS How would you classify this data? X1 X2
  41. 41. LINEAR SEPERATORS How would you classify this data? X1 X2
  42. 42. LINEAR SEPERATORS X1 X2
  43. 43. LINEAR SEPERATORS X1 X2 In SVM we search for the linear separator that has the maximal margin. Using a mathematical trick, called The Kernel Trick, SVMs can also find non-linear separators
  44. 44. DECISION TREES Example: classify new customers into one of two groups: Standard and VIP. Training set: a list of customers that were once new, along with an annotation that reflect if these customers should have been identified as VIP (this annotation is made only after some time). Let us say that we have 1,000 VIPs and 4,000 Standard new customers
  45. 45. DECISION TREES Let us say that we have 1,000 VIPs and 4,000 Standard new customers 1,000 V 4,000 S
  46. 46. DECISION TREES The population is a mix of different types. What if we could find splitting criterion that will create two (or more), more pure sub populations 1,000 V 4,000 S
  47. 47. DECISION TREES The population is a mix of different types. What if we could find splitting criterion that will create two (or more), more pure sub populations 1,000 V 4,000 S Self Employed 600 V 800 S Employees 400 V 3,200 S
  48. 48. DECISION TREES Now, we can take each sub-population and split it recursively, until some stopping criteria are met. 1,000 V 4,000 S Self Employed 600 V 800 S Employees 400 V 3,200 S
  49. 49. DECISION TREES • Decision trees are a result of recursive splitting mechanism • Each split is chosen as to maximize the purity of the sub populations that results from the split • Few ways to model node purity. Often the concept of minimal entropy (or a variation of minimal entropy) is used • Each split is made according to the values of one of the explaining features
  50. 50. LINEAR REGRESSION 0 50 100 150 200 250 300 350 400 450 0 1000 2000 3000 HousePrice($1000s) Square Feet
  51. 51. LINEAR REGRESSION 0 50 100 150 200 250 300 350 400 450 0 1000 2000 3000 HousePrice($1000s) Square Feet
  52. 52. SUPERVISED LEARNING EVALUATION Since Supervised Learning is all about generalization, a good model is a model that can be applied successfully to new observations In classification tasks, we are often interested in the probability that the model will extract the true outcome. This probability is called the model accuracy In regression tasks, we are often interested in the average deviation between the outcome of the model and the true outcome. This deviation is called RMSE
  53. 53. Too General / Too Simple / Under fitted Too Specific / Too Complex / Over fitted A nice solution to the trade-off
  54. 54. SUPERVISED LEARNING EVALUATION It is always possible to build an over fitted model. So the quality of the model on the training set say very little on the capability of the model to generalize to new observations. Therefor, never evaluate a model using the training set Instead: • Use an independent (randomly selected) test set • Use cross validation
  55. 55. SUPERVISED LEARNING EVALUATION RedBlue 17Blue 50Red Classified As Actual Confusion Matrix
  56. 56. SUPERVISED LEARNING EVALUATION RedBlue 17Blue 50Red Classified As Actual Confusion Matrix Accuracy (on test set) = (7+5)/(7+5+1+0)
  57. 57. CROSS VALIDATION Randomly break the training set into k mutually exclusive, collectively exhaustive sets, of similar size (often k=10). For i=1,2,…k: Train a model using all the sets, except for the i-th set. Evaluate the trained model over the i-th set. You end up with k evaluation measures. Evaluate the entire model as the average of these k results.
  58. 58. SUPERVISED LEARNING SUMMARY • Two sub problems: classification and regression • Supervised Learning is all about generalizing from a given training set • There is an inherent, hard to solve trade-off between generalization and over specification • The more complexity you add to your model, the better it can fit the training set. You may gain an over fitted model • Therefor, you never evaluate a model on the training set that was used to induce it • Instead, use either and independent test set, or cross validation
  59. 59. SUPERVISED LEARNING SUMMARY • We also got familiar with 4 SL methods: K-NN, SVM, Decision trees and Linear regression
  60. 60. AGENDA • What is Machine Learning • Typical Machine Learning Tasks • Supervised Learning • Unsupervised Learning • How to Get Started • Summary
  61. 61. UNSUPERVISED LEARNING X1 X2 X3 … Xn-2 Xn-1 Xn x1,1 x2,1 x3,1 … xn-2,1 xn-1,1 xn,1 x1,2 x2,2 x3,2 … xn-2,2 xn-1,2 xn,2 . . . . . . . . . … … … . . . . . . . . . x1,m-1 x2,m-1 x3,m-1 … xn-2,m-1 xn-1,m-1 xn,m-1 x1,m x2,m x3,m … xn-2,m xn-1,m xn,m Extract interesting patterns from the input set or Provide an insightful representation of the input space
  62. 62. UNSUPERVISED LEARNING Unsupervised Learning tasks: • Cluster Analysis • Association Rules Mining • Hidden Markov Models • Dimensionality Reduction • Self-Organising Maps
  63. 63. CLUSTER ANALYSIS Data points that share a cluster need to be similar Data points in different clusters need to be different Similarity = Low distance Difference = High distance?
  64. 64. CLUSTER ANALYSIS
  65. 65. CLUSTER ANALYSIS
  66. 66. CLUSTER ANALYSIS
  67. 67. CLUSTER ANALYSIS K-Means: Initialize: place k cluster centroids on the feature space Repeat until some stopping criteria are met: Associate each data point to the closest centroid Move each centroid to the center of the points that are associated to it
  68. 68. CLUSTER ANALYSIS Does distance means similarity? What distance?
  69. 69. CLUSTER ANALYSIS Does distance means similarity? What distance? For example, let us look at similarity in monthly salary. Mr. X earns $2,500 a month Mrs. Y earns $250,000 a month Mr. Z earns $100,00 a month. Is he more similar (in terms of salary) to X or to Y?
  70. 70. CLUSTER ANALYSIS Does distance means similarity? What distance? How should we compute a multi-dimensional distance? Player Name Height Position Age Plays in Goals this year Annual Wages Country of Birth Lionel Andrés Messi 169 cm Forward 30 Spain 41 M 36 EUR Portugal Cristiano Ronaldo 185cm Forward 31 Spain 27 M 17 EUR Argentina
  71. 71. AGENDA • What is Machine Learning • Typical Machine Learning Tasks • Supervised Learning • Unsupervised Learning • How to Get Started • Summary
  72. 72. HOW TO GET STARTED • Maintaining and manipulating more and more data becomes more and more affordable • Machine Learning suggest a very reach set of boxes. Selecting the right boxes and building a business solution requires lots of experience • Training the right models, tuning the parameters, evaluating performance and implementation all require some level of expertise but this should not be your first concerns • The prediction is not in the box
  73. 73. HOW TO GET STARTED Business Value Implement Machine Learning Business Definition
  74. 74. CRISP-DM
  75. 75. HOW TO GET STARTED A recommended checklist, before you even start: 1. What am trying to achieve, businesswise? 2. What data it requires? Do I have this data? Am I allowed to use it? 3. What will be the output of a machine learning model? 4. Can my operations use that output? How? 5. What machine learning task am I trying to solve? 6. What are the success criteria? 7. Who will be the ones to run the project? 8. How long will it take? How much will it cost?
  76. 76. AGENDA • What is Machine Learning • Typical Machine Learning Tasks • Supervised Learning • Unsupervised Learning • How to Get Started • Summary
  77. 77. SUMMARY • Machine learning = designing machines that learn from experience • Three typical tasks: • Supervised Learning • Unsupervised Learning • Reinforcement Learning • Supervised Learning: • Learning means generalization • Generalization vs. Specification, Over fitting and Under fitting • Classification vs. Regression
  78. 78. SUMMARY • Supervised Learning algorithms: • K-NN • SVM • Decision Trees • Linear Regression • More • Unsupervised Learning • Cluster analysis: similarity and distance • Association rules • Reinforcement Learning • The big data challenge of Machine Learning • CRISP-DM
  79. 79. INTRODUCTION TO MACHINE LEARNING

×