Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

DutchMLSchool. Models, Evaluations, and Ensembles

78 views

Published on

DutchMLSchool. Introduction to Machine Learning, Models, Evaluations, and Ensembles (Supervised Learning I) - Main Conference: Introduction to Machine Learning.
DutchMLSchool: 1st edition of the Machine Learning Summer School in The Netherlands.

Published in: Data & Analytics
  • Be the first to comment

  • Be the first to like this

DutchMLSchool. Models, Evaluations, and Ensembles

  1. 1. 1st edition | July 8-11, 2019
  2. 2. BigML, Inc #DutchMLSchool Supervised Learning I Introduction to Machine Learning, Models, Evaluations and Ensembles Poul Petersen CIO, BigML, Inc 2
  3. 3. BigML, Inc #DutchMLSchool Machine Learning Motivation 3 • You are looking to buy a house • Recently found a house you like • Is the asking price fair? Imagine: What Next?
  4. 4. BigML, Inc #DutchMLSchool Maching Learning Motivation 4 Why not ask an expert? • Experts can be rare / expensive • Hard to validate experience: • Experience with similar properties? • Do they consider all relevant variables? • Knowledge of market up to date? • Hard to validate answer: • How many times expert right / wrong? • Probably can’t explain decision in detail • Humans are not good at intuitive statistics
  5. 5. BigML, Inc #DutchMLSchool Data vs Expert 5 Replace the expert with data? • Intuition: square footage relates to price. • Collect data from past sales SQFT SOLD 2424 360000 1785 307500 1003 185000 4135 600000 1676 328500 1012 247000 3352 420000 2825 435350 PRICE = 125.3*SQFT + 96535 PREDICT 400262 320195 222211 614651 306538 223339 516541 450508
  6. 6. BigML, Inc #DutchMLSchool Data vs Expert 6 Replace the expert scorecard • Experts can be rare / expensive • Hard to validate experience: • Experience with similar properties? • Do they consider all relevant variables? • Knowledge of market up to date? • Hard to validate answer: • How many times expert right / wrong? • Probably can’t explain decision in detail • Humans are not good at intuitive statistics
  7. 7. BigML, Inc #DutchMLSchool Data vs Expert 7 Replace the expert with data • Intuition: square footage relates to price. • Collect data from past sales SQFT SOLD 2424 360000 1785 307500 1003 185000 4135 600000 1676 328500 1012 247000 3352 420000 2825 435350 PRICE = 125.3*SQFT + 96535
  8. 8. BigML, Inc #DutchMLSchool More Data! 8 SQFT BEDS BATHS ADDRESS LOCATION LOT SIZE YEAR BUILT PARKING SPOTS LATITUDE LONGITUDE SOLD 2424 4 3 1522 NW Jonquil Timberhill SE 2nd 5227 1991 2 44,594828 -123,269328 360000 1785 3 2 7360 NW Valley Vw Country Estates 25700 1979 2 44,643876 -123,238189 307500 1003 2 1 2620 NW Chinaberry Tamarack Village 4792 1978 2 44,593704 -123,295424 185000 4135 5 3,5 4748 NW Veronica Suncrest 6098 2004 3 44,5929659 -123,306916 600000 1676 3 2 2842 NW Monterey Corvallis 8712 1975 2 44,5945279 -123,291523 328500 1012 3 1 2320 NW Highland Corvallis 9583 1959 2 44,591476 -123,262841 247000 3352 4 3 1205 NW Ridgewood Ridgewood 2 60113 1975 2 44,579439 -123,333888 420000 2825 3 411 NW 16th Wilkins Addition 4792 1938 1 44,570883 -123,272113 435350 Uhhhh…….. • Can we still fit a line to 10 variables? (well, yes) • Will fitting a line give good results? (unlikely) • What about those text fields and categorical values?
  9. 9. BigML, Inc #DutchMLSchool Models 9
  10. 10. BigML, Inc #DutchMLSchool Mythical ML Model? 10 • High representational power • Fitting a line is an example of low • Deep neural networks is an example of high • High Ease-of-use • Easy to configure - relatively few parameters • Easy to interpret - how are decisions made? • Easy to put into production • Ability to work with real-world data • Mixed data types: numeric, categorical, text, etc • Handle missing values • Resilient to outliers • There are actually hundreds of possible choices…
  11. 11. BigML, Inc #DutchMLSchool Decision Trees 11 Last Bill > $180 and Support Calls > 0 Remember This?
  12. 12. BigML, Inc #DutchMLSchool Decision Tree Demo #1 12
  13. 13. BigML, Inc #DutchMLSchool What Just Happened? 13 • We started with Housing data as a CSV from Redfin • We uploaded the CSV to create Source • Then we created a Dataset from the Source and reviewed the summary statistics • With 1-click we build a Model which can predict home prices based on all the housing features • We explored the Model and used it to make a Prediction
  14. 14. BigML, Inc #DutchMLSchool Why Decision Trees 14 • Works for classification or regression
  15. 15. BigML, Inc #DutchMLSchool Why Decision Trees 15 • Works for classification or regression • Easy to understand: splits are features and values • Lightweight and super fast at prediction time
  16. 16. BigML, Inc #DutchMLSchool DT Predictions 16 Question 2 Prediction Question 1
  17. 17. BigML, Inc #DutchMLSchool Why Decision Trees 17 • Works for classification or regression • Easy to understand: splits are features and values • Lightweight and super fast at prediction time • Relatively parameter free • Data can be messy • Useless features are automatically ignored • Works with un-normalized data • Works with missing data at Training
  18. 18. BigML, Inc #DutchMLSchool Training with Missing 18 Reason Missing? Loan Amount?
  19. 19. BigML, Inc #DutchMLSchool Why Decision Trees 19 • Works for classification or regression • Easy to understand: splits are features and values • Lightweight and super fast at prediction time • Relatively parameter free • Data can be messy • Useless features are automatically ignored • Works with un-normalized data • Works with missing data at Training & Prediction
  20. 20. BigML, Inc #DutchMLSchool Predictions with Missing 20 Missing? Question 1 Last Prediction
  21. 21. BigML, Inc #DutchMLSchool Predictions with Missing 21 Missing? Question 1 Skip Question 2 Question 3 Avg Prediction
  22. 22. BigML, Inc #DutchMLSchool Why Decision Trees 22 • Works for classification or regression • Easy to understand: splits are features and values • Lightweight and super fast at prediction time • Relatively parameter free • Data can be messy • Useless features are automatically ignored • Works with un-normalized data • Works with missing data at Training & Prediction • Resilient to outliers • High representational power • Works easily with mixed data types
  23. 23. BigML, Inc #DutchMLSchool Data Types 23 numeric 1 2 3 1, 2.0, 3, -5.4 categorical true / false yes / no giraffe / zebra / ape categoricalcategorical A B C YEAR MONTH DAY-OF-MONTH YYYY-MM-DD DAY-OF-WEEK HOUR MINUTE YYYY-MM-DD YYYY-MM-DD M-T-W-T-F-S-D HH:MM:SS HH:MM:SS 2013 September 25 Wednesday 10 02 DATE-TIME2013-09-25 10:02 DATE-TIME text Be not afraid of greatness: some are born great, some achieve greatness, and some have greatness thrust upon 'em. text “great” “afraid” “born” appears 2 times appears 1 time appears 1 time items bread, sugar, coffee, milk ice cream, hot fudge items
  24. 24. BigML, Inc #DutchMLSchool Why Not Decision Trees 24 • Slightly prone to over-fitting. (what is that again?)
  25. 25. BigML, Inc #DutchMLSchool Learning Problems (fit) 25 Under-fitting Over-fitting • Model fits too well does not “generalize” • Captures the noise or outliers of the data • Change algorithm or filter outliers
  26. 26. BigML, Inc #DutchMLSchool Why Not Decision Trees 26 • Slightly prone to over-fitting • But we’ll fix this with ensembles • Splitting prefers decision boundaries that are parallel to feature axes
  27. 27. BigML, Inc #DutchMLSchool Splits Parallel to Axis 27 But not Possible! Ideal split…
  28. 28. BigML, Inc #DutchMLSchool Splits Parallel to Axis 28 Will “discover” diagonal edge eventually
  29. 29. BigML, Inc #DutchMLSchool Why Not Decision Trees 29 • Slightly prone to over-fitting • But we’ll fix this with ensembles • Splitting prefers decision boundaries that are parallel to feature axes • More data! • Predictions outside training data can be problematic
  30. 30. BigML, Inc #DutchMLSchool Outlier Predictions 30 ?
  31. 31. BigML, Inc #DutchMLSchool Why Not Decision Trees 31 • Slightly prone to over-fitting • But we’ll fix this with ensembles • Splitting prefers decision boundaries that are parallel to feature axes • More data! • Predictions outside training data can be problematic • We can catch this with model competence • Can be sensitive to small changes in training data
  32. 32. BigML, Inc #DutchMLSchool Outlier Predictions 32
  33. 33. BigML, Inc #DutchMLSchool Why Not Decision Trees 33 • Slightly prone to over-fitting • But we’ll fix this with ensembles • Splitting prefers decision boundaries that are parallel to feature axes • More data! • Predictions outside training data can be problematic • We can catch this with model competence • Can be sensitive to small changes in training data • What other models can we try? • And how will we know which one works best?
  34. 34. BigML, Inc #DutchMLSchool Evaluations 34
  35. 35. BigML, Inc #DutchMLSchool Easy Right? 35 INTL MIN INTL CALLS INTL CHARGE CUST SERV CALLS CHURN 8,7 4 2,35 1 False 11,2 5 3,02 0 False 12,7 6 3,43 4 True 9,1 5 2,46 0 False 11,2 2 3,02 1 False 12,3 5 3,32 3 False 13,1 6 3,54 4 False 5,4 9 1,46 4 True 13,8 4 3,73 1 False Model Prediction PREDICT CHURN False True True False False False False False False Count up mistakes!
  36. 36. BigML, Inc #DutchMLSchool Mistakes can be Costly 36 FUN! + = DANGER! Insight: Labeling a Yield as a stop is not as bad as labelling a stop as a yield… Need better metrics!
  37. 37. BigML, Inc #DutchMLSchool Evaluation Metrics 37 • Imagine we have a model that can predict a person’s dominant hand, that is for any individual it predicts left / right • Define the positive class • This selection is arbitrary • It is the class you are interested in! • The negative class is the “other” class (or others) • For this example, we choose : left
  38. 38. BigML, Inc #DutchMLSchool Evaluation Metrics 38 • We choose the positive class: left • True Positive (TP) • We predicted left and the correct answer was left • True Negative (TN) • We predicted right and the correct answer was right • False Positive (FP) • Predicted left but the correct answer was right • False Negative (FN) • Predict right but the correct answer was left
  39. 39. BigML, Inc #DutchMLSchool Evaluation Metrics 39 True Positive: Correctly predicted the positive class True Negative: Correctly predicted the negative class False Positive: Incorrectly predicted the positive class False Negative: Incorrectly predicted the negative class Remember…
  40. 40. BigML, Inc #DutchMLSchool Accuracy 40 TP + TN Total • “Percentage correct” - like an exam • If Accuracy = 1 then no mistakes • If Accuracy = 0 then all mistakes • Intuitive but not always useful • Watch out for unbalanced classes! • Ex: 90% of people are right-handed and 10% are left • A silly model which always predicts right handed is 90% accurate
  41. 41. BigML, Inc #DutchMLSchool Accuracy 41 Classified as Left Handed Classified as Right Handed TP = 0 FP = 0 TN = 7 FN = 3 = Left = RightPositive Class Negative Class TP + TN Total = 70%
  42. 42. BigML, Inc #DutchMLSchool Precision 42 TP TP + FP • “accuracy” or “purity” of positive class • How well you did separating the positive class from the negative class • If Precision = 1 then no FP. • You may have missed some left handers, but of the ones you identified, all are left handed. No mistakes. • If Precision = 0 then no TP • None of the left handers you identified are actually left handed. All mistakes.
  43. 43. BigML, Inc #DutchMLSchool Precision 43 Classified as Left Handed Classified as Right Handed TP = 2 FP = 2 TN = 5 FN = 1 Positive Class Negative Class = Left = Right TP TP + FP = 50%
  44. 44. BigML, Inc #DutchMLSchool Recall 44 TP TP + FN • percentage of positive class correctly identified • A measure of how well you identified all of the positive class examples • If Recall = 1 then no FN → All left handers identified • There may be FP, so precision could be <1 • If Recall = 0 then no TP → No left handers identified
  45. 45. BigML, Inc #DutchMLSchool Recall 45 Classified as Left Handed Classified as Right Handed TP = 2 FP = 2 TN = 5 FN = 1 Positive Class Negative Class = Left = Right TP TP + FN = 66%
  46. 46. BigML, Inc #DutchMLSchool f-Measure 46 2 * Recall * Precision Recall + Precision • harmonic mean of Recall & Precision • If f-measure = 1 then Recall == Precision == 1 • If Precision OR Recall is small then the f-measure is small
  47. 47. BigML, Inc #DutchMLSchool Phi Coefficient 47 __________TP*TN_-_FP*FN__________ SQRT[(TP+FP)(TP+FN)(TN+FP)(TN+FN)] • Returns a value between -1 and 1 • If -1 then predictions are opposite reality • =0 no correlation between predictions and reality • =1 then predictions are always correct
  48. 48. BigML, Inc #DutchMLSchool Evaluations Demo #1 48
  49. 49. BigML, Inc #DutchMLSchool What Just Happened? 49 • Starting with the Diabetes Source, we created a Dataset and then a Model. • Using both the Model and the original Dataset, we created an Evaluation. • We reviewed the metrics provided by the Evaluation: • Confusion Matrix • Accuracy, Precision, Recall, f-measure and phi • This Model seemed to perform really, really well… Question: Can we trust this model?
  50. 50. BigML, Inc #DutchMLSchool Evaluation Danger! 50 • Never evaluate with the training data! • Many models are able to “memorize” the training data • This will result in overly optimistic evaluations!
  51. 51. BigML, Inc #DutchMLSchool “Memorizing” Training Data 51 plasma glucose bmi diabetes pedigree age diabetes 148 33,6 0,627 50 TRUE 85 26,6 0,351 31 FALSE 183 23,3 0,672 32 TRUE 89 28,1 0,167 21 FALSE 137 43,1 2,288 33 TRUE 116 25,6 0,201 30 FALSE 78 31 0,248 26 TRUE 115 35,3 0,134 29 FALSE 197 30,5 0,158 53 TRUE Training Evaluating plasma glucose bmi diabetes pedigree age diabetes 148 33,6 0,627 50 ? 85 26,6 0,351 31 ? • Exactly the same values! • Who needs a model? • What we want to know is how the model performs with values never seen at training: 124 22 0,107 46 ?
  52. 52. BigML, Inc #DutchMLSchool Evaluation Danger! 52 • Never evaluate with the training data! • Many models are able to “memorize” the training data • This will result in overly optimistic evaluations! • If you only have one Dataset, use a train/test split
  53. 53. BigML, Inc #DutchMLSchool Train / Test Split 53 plasma glucose bmi diabetes pedigree age diabetes 148 33,6 0,627 50 TRUE 183 23,3 0,672 32 TRUE 89 28,1 0,167 21 FALSE 78 31 0,248 26 TRUE 115 35,3 0,134 29 FALSE 197 30,5 0,158 53 TRUE Train Test plasma glucose bmi diabetes pedigree age diabetes 85 26,6 0,351 31 FALSE 137 43,1 2,288 33 TRUE 116 25,6 0,201 30 FALSE • These instances were never seen at training time. • Better evaluation of how the model will perform with “new” data
  54. 54. BigML, Inc #DutchMLSchool Evaluation Danger! 54 • Never evaluate with the training data! • Many models are able to “memorize” the training data • This will result in overly optimistic evaluations! • If you only have one Dataset, use a train/test split • Even a train/test split may not be enough! • Might get a “lucky” split • Solution is to repeat several times (formally to cross validate)
  55. 55. BigML, Inc #DutchMLSchool Evaluations Demo #2 55
  56. 56. BigML, Inc #DutchMLSchool What Just Happened? 56 • Starting with the Diabetes Dataset we created a train/test split • We built a Model using the train set and evaluated it with the test set • The scores were much worse than before, showing the danger of evaluating with training data. • Then we built several other models with different parameters and used the evaluation comparison tool to see which performed the best. Question: Couldn’t we search for the best Model or parameters? STAY TUNED
  57. 57. BigML, Inc #DutchMLSchool Evaluation 57 • Never evaluate with the training data! • Many models are able to “memorize” the training data • This will result in overly optimistic evaluations! • If you only have one Dataset, use a train/test split • Even a train/test split may not be enough! • Might get a “lucky” split • Solution is to repeat several times (formally to cross validate) • Don’t forget that accuracy can be mis-leading! • Mostly useless with unbalanced classes (left/right?) • Use weighting, operating points, other tricks…
  58. 58. BigML, Inc #DutchMLSchool Operating Points 58 • The default probability threshold is 50% • Changing the threshold can change the outcome for a specific class Rate Payment … Actual Outcome Probability PAID Threshold @ 50% Threshold @ 60% Threshold @ 90% 8,4 % US$456 … PAID 95 % PAID PAID PAID 9,6 % US$134 … PAID 87 % PAID PAID DEFAULT 18 % US$937 … DEFAULT 36 % DEFAULT DEFAULT DEFAULT 21 % US$35 … PAID 88 % PAID PAID DEFAULT 17,5 %US$1.044 … DEFAULT 55 % PAID DEFAULT DEFAULT
  59. 59. BigML, Inc #DutchMLSchool What about Regressions? 59 • No classes: • Not possible to count mistakes: TP, FP, TN, FN • Predicted values are numeric: error is the amount “off” • actual 200, predict 180 = error 20 • Mean Absolute Error / Mean Squared Error • Both are a measure of total error • Note: value of the error is “unbounded”. • When comparing models, lower values are “better” • R-Squared Error • Measure of how much better the model is than always predicting the mean • < 0 model is worse then mean • = 0 model is no better than the mean • ➞ 1 model fits the data “perfectly”
  60. 60. BigML, Inc #DutchMLSchool Evaluations Demo #3 60
  61. 61. BigML, Inc #DutchMLSchool What just happened? 61 • We split the RedFin data into training and test Datasets • We created a Model and Evaluation • We examined the Evaluation metrics Wait - What about Time Series?
  62. 62. BigML, Inc #DutchMLSchool Dependent Data 62 Year Pineapple Harvest1986 50,74 1987 22,03 1988 50,69 1989 40,38 1990 29,80 1991 9,90 1992 73,93 1993 22,95 1994 139,09 1995 115,17 1996 193,88 1997 175,31 1998 223,41 1999 295,03 2000 450,53 Pineapple Harvest Tons 0 125 250 375 500 Year 1986 1988 1990 1992 1994 1996 1998 2000 Trend Error
  63. 63. BigML, Inc #DutchMLSchool Dependent Data 63 Pineapple Harvest Tons 0 125 250 375 500 Year 1986 1988 1990 1992 1994 1996 1998 2000 Year Pineapple Harvest1986 139,09 1987 175,31 1988 9,91 1989 22,95 1990 450,53 1991 73,93 1992 40,38 1993 22,03 1994 295,03 1995 50,74 1996 29,8 1997 223,41 1998 115,17 1999 193,88 2000 50,69 Rearranging Disrupts Patterns
  64. 64. BigML, Inc #DutchMLSchool Random Train / Test Split 64 plasma glucose bmi diabetes pedigree age diabetes 148 33,6 0,627 50 TRUE 183 23,3 0,672 32 TRUE 89 28,1 0,167 21 FALSE 78 31 0,248 26 TRUE 115 35,3 0,134 29 FALSE 197 30,5 0,158 53 TRUE Train Test plasma glucose bmi diabetes pedigree age diabetes 85 26,6 0,351 31 FALSE 137 43,1 2,288 33 TRUE 116 25,6 0,201 30 FALSE
  65. 65. BigML, Inc #DutchMLSchool Linear Train / Test Split 65 Train Test Year Pineapple Harvest1986 50,74 1987 22,03 1988 50,69 1989 40,38 1990 29,80 1991 9,90 1992 73,93 1993 22,95 1994 139,09 1995 115,17 1996 193,88 Year Pineapple Harvest 1997 175,31 1998 223,41 1999 295,03 2000 450,53 Forecast COMPARE
  66. 66. BigML, Inc #DutchMLSchool Ensembles 66
  67. 67. BigML, Inc #DutchMLSchool what is an Ensemble? 67 • Rather than build a single model… • Combine the output of several typically “weaker” models into a powerful ensemble… • Q1: Why is this necessary? • Q2: How do we build “weaker” models? • Q3: How do we “combine” models?
  68. 68. BigML, Inc #DutchMLSchool No Model is Perfect 68 • A given ML algorithm may simply not be able to exactly model the “real solution” of a particular dataset. • Try to fit a line to a curve • Even if the model is very capable, the “real solution” may be elusive • DT/NN can model any decision boundary with enough training data, but the solution is NP-hard • Practical algorithms involve random processes and may arrive at different, yet equally good, “solutions” depending on the starting conditions, local optima, etc. • If that wasn’t bad enough…
  69. 69. BigML, Inc #DutchMLSchool No Data is Perfect 69 • Not enough data! • Always working with finite training data • Therefore, every “model” is an approximation of the “real solution” and there may be several good approximations. • Anomalies / Outliers • The model is trying to generalize from discrete training data. • Outliers can “skew” the model, by overfitting • Mistakes in your data • Does the model have to do everything for you? • But really, there is always mistakes in your data
  70. 70. BigML, Inc #DutchMLSchool Ensemble Techniques 70 • Key Idea: • By combining several good “models”, the combination may be closer to the best possible “model” • we want to ensure diversity. It’s not useful to use an ensemble of 100 models that are all the same • Training Data Tricks • Build several models, each with only some of the data • Introduce randomness directly into the algorithm • Add training weights to “focus” the additional models on the mistakes made • Prediction Tricks • Model the mistakes • Model the output of several different algorithms
  71. 71. BigML, Inc #DutchMLSchool Simple Example - Fit a Line 71
  72. 72. BigML, Inc #DutchMLSchool Simple Example - Fit a Line 72
  73. 73. BigML, Inc #DutchMLSchool Simple Example - Fit a Line 73 Partition the data… then model each partition… For predictions, use the model for the same partition ?
  74. 74. BigML, Inc #DutchMLSchool Decision Forest 74 MODEL 1 DATASET SAMPLE 1 SAMPLE 2 SAMPLE 3 SAMPLE 4 MODEL 2 MODEL 3 MODEL 4 PREDICTION 1 PREDICTION 2 PREDICTION 3 PREDICTION 4 PREDICTION COMBINER
  75. 75. BigML, Inc #DutchMLSchool Random Decision Forest 75 MODEL 1 DATASET SAMPLE 1 SAMPLE 2 SAMPLE 3 SAMPLE 4 MODEL 2 MODEL 3 MODEL 4 PREDICTION 1 PREDICTION 2 PREDICTION 3 PREDICTION 4 SAMPLE 1 PREDICTION COMBINER
  76. 76. BigML, Inc #DutchMLSchool Boosting 76 ADDRESS BEDS BATHS SQFT LOT SIZE YEAR BUILT LATITUDE LONGITUDE LAST SALE PRICE 1522 NW Jonquil 4 3 2424 5227 1991 44,594828 -123,269328 360000 7360 NW Valley Vw 3 2 1785 25700 1979 44,643876 -123,238189 307500 4748 NW Veronica 5 3,5 4135 6098 2004 44,5929659 -123,306916 600000 411 NW 16th 3 2825 4792 1938 44,570883 -123,272113 435350 MODEL 1 PREDICTED SALE PRICE 360750 306875 587500 435350 ERROR 750 -625 -12500 0 ADDRESS BEDS BATHS SQFT LOT SIZE YEAR BUILT LATITUDE LONGITUDE ERROR 1522 NW Jonquil 4 3 2424 5227 1991 44,594828 -123,269328 750 7360 NW Valley Vw 3 2 1785 25700 1979 44,643876 -123,238189 625 4748 NW Veronica 5 3,5 4135 6098 2004 44,5929659 -123,306916 12500 411 NW 16th 3 2825 4792 1938 44,570883 -123,272113 0 MODEL 2 PREDICTED ERROR 750 625 12393,83333 6879,67857 Why stop at one iteration? "Hey Model 1, what do you predict is the sale price of this home?" "Hey Model 2, how much error do you predict Model 1 just made?"
  77. 77. BigML, Inc #DutchMLSchool Boosting 77 DATASET MODEL 1 DATASET 2 MODEL 2 DATASET 3 MODEL 3 DATASET 4 MODEL 4 PREDICTION 1 PREDICTION 2 PREDICTION 3 PREDICTION 4 PREDICTION SUM Iteration 1 Iteration 2 Iteration 3 Iteration 4 etc…
  78. 78. BigML, Inc #DutchMLSchool Ensembles Demo #1 78
  79. 79. BigML, Inc #DutchMLSchool Which Ensemble Method 79 • The one that works best! • Ok, but seriously. Did you evaluate? • For "large" / "complex" datasets • Use DF/RDF with deeper node threshold • Even better, use Boosting with more iterations • For "noisy" data • Boosting may overfit • RDF preferred • For "wide" data • Randomize features (RDF) will be quicker • For "easy" data • A single model may be fine • Bonus: also has the best interpretability! • For classification with "large" number of classes • Boosting will be slower • For "general" data • DF/RDF likely better than a single model or Boosting. • Boosting will be slower since the models are processed serially
  80. 80. BigML, Inc #DutchMLSchool Ensembles Demo #2 80
  81. 81. BigML, Inc #DutchMLSchool Summary 81 • Models have shortcomings: ability to fit, NP-hard, etc • Data has shortcomings: not enough, outliers, mistakes, etc • Ensemble Techniques can improve on single models • Sampling: partitioning, Decision Tree bagging • Adding Randomness: RDF • Modeling the Error: Boosting • Modeling the Models: Stacking • Guidelines for knowing which one might work best in a given situation
  82. 82. Co-organized by: Sponsor: Business Partners:

×