Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

BigML Release: OptiML

384 views

Published on

OptiML is an optimization process for model selection and parametrization that automatically finds the best supervised model to help you solve classification and regression problems. OptiML is available from the BigML Dashboard, API, and WhizzML. This new resource creates and evaluates hundreds of supervised models (decision trees, ensembles, logistic regression, and deepnets) with multiple configurations to finally return a list of the best models for your data. OptiML helps to avoid the difficult and time-consuming work of hand-tuning multiple supervised algorithms until you find the optimal one that solves your specific problem.

Published in: Data & Analytics
  • Be the first to comment

  • Be the first to like this

BigML Release: OptiML

  1. 1. Introducing OptiML BigML Release: OptiML
  2. 2. BigML, Inc 2OptiML Release Webinar OptiML Release CHARLES PARKER, PH.D. - VP of Machine Learning Algorithms Please enter questions into chat box – We will answer some via chat and others at the end of the session https://bigml.com/releases ATAKAN CETINSOY - VP of Predictive Applications Resources Moderator Speaker Contact support@bigml.com Twitter @bigmlcom Questions
  3. 3. BigML, Inc 3OptiML Release Webinar Parameter Optimization • There are lots of algorithms and lots of parameters • We don’t have time to try even close to everything • If only we had a way to make a prediction . . . Did I hear someone say Machine Learning?
  4. 4. BigML, Inc 4OptiML Release Webinar The Allure of ML “Why don’t we just use Machine Learning to predict the quality of a set of modeling parameters before we train a model on them?” — Every first year ML grad student ever
  5. 5. BigML, Inc 5OptiML Release Webinar In This Webinar • Technology Overview • Metric Selection • The Dangers of Naive Cross-validation • Selecting the “Best” Model • Caveat Emptor!
  6. 6. BigML, Inc 6OptiML Release Webinar In This Webinar • Technology Overview • Metric Selection • The Dangers of Naive Cross-validation • Selecting the “Best” Model • Caveat Emptor!
  7. 7. BigML, Inc 7OptiML Release Webinar Bayesian Parameter Optimization • The performance of a ML algorithm (with associated parameters) is data dependent • So: Learn from your previous attempts • Train a model, then evaluate it • After you’ve done a number of evaluations, learn a regression model to predict the performance of future, as-yet-untrained models • Use this classifier to chose a promising set of “next models” to evaluate
  8. 8. BigML, Inc 8OptiML Release Webinar Model and EvaluateParameters 1 Parameters 2 Parameters 3 Parameters 4 Parameters 5 Parameters 6 Bayesian Parameter Optimization
  9. 9. BigML, Inc 9OptiML Release Webinar 0.75 Model and EvaluateParameters 1 Parameters 2 Parameters 3 Parameters 4 Parameters 5 Parameters 6 Bayesian Parameter Optimization
  10. 10. BigML, Inc 10OptiML Release Webinar 0.75 0.56 Model and EvaluateParameters 1 Parameters 2 Parameters 3 Parameters 4 Parameters 5 Parameters 6 Bayesian Parameter Optimization
  11. 11. BigML, Inc 11OptiML Release Webinar 0.75 0.56 0.92 Model and EvaluateParameters 1 Parameters 2 Parameters 3 Parameters 4 Parameters 5 Parameters 6 Bayesian Parameter Optimization
  12. 12. BigML, Inc 12OptiML Release Webinar 0.75 0.56 0.92 Model and EvaluateParameters 1 Parameters 2 Parameters 3 Parameters 4 Parameters 5 Parameters 6 Bayesian Parameter Optimization
  13. 13. BigML, Inc 13OptiML Release Webinar 0.75 0.56 0.92 Model and EvaluateParameters 1 Parameters 2 Parameters 3 Parameters 4 Parameters 5 Parameters 6 Machine Learning! parameters ⟶ performance Bayesian Parameter Optimization
  14. 14. BigML, Inc 14OptiML Release Webinar Model and EvaluateParameters 1 Parameters 2 Parameters 3 Parameters 4 Parameters 5 Parameters 6 0.75 0.56 0.92 Machine Learning! parameters ⟶ performance Bayesian Parameter Optimization
  15. 15. BigML, Inc 15OptiML Release Webinar Some Other Tricks • Use metalearning to select a good set of initial candidates • Cross-validation is expensive, and there’s no reason to do it for models with terrible performance; stop early in these cases
  16. 16. BigML, Inc 16OptiML Release Webinar In This Webinar • Technology Overview • Metric Selection • The Dangers of Naive Cross-validation • Selecting the “Best” Model • Caveat Emptor!
  17. 17. BigML, Inc 17OptiML Release Webinar A Metric Selection Flowchart YES YES YES NO NO NO YES NO Will you bother about threshold setting? Is yours a “ranking” problem? Is your dataset imbalanced? Do you care more about the top-ranked instances? Max. Phi KS-statistic Area Under the ROC / PR curve Kendall’s Tau Spearman’s Rho Accuracy Phi coefficient f-measure
  18. 18. BigML, Inc 18OptiML Release Webinar Ranking Problems Medical Diagnosis (no) vs. Stock Picking (yes)
  19. 19. BigML, Inc 19OptiML Release Webinar Top-heavy Importance Draft-Style Selections (no) vs. Customer Churn (yes)
  20. 20. BigML, Inc 20OptiML Release Webinar In This Webinar • Technology Overview • Metric Selection • The Dangers of Naive Cross-validation • Selecting the “Best” Model • Caveat Emptor!
  21. 21. BigML, Inc 21OptiML Release Webinar Is Cross-Validation Right for You? • Cross-validation is a good tool some of the time • Many other times, it is disastrously bad • Overly optimistic • False confidence in results • This is why we offer the option for a specific holdout set
  22. 22. BigML, Inc 22OptiML Release Webinar Case #1: Market Direction • Suppose you want to predict the direction of the stock market • You have information for that market for each minute of each day • But minutes next to each other are correlated in the input and objective field • So if you have the answer for one minute, you can trivially predict the rest! • Cross-validation will tell you your classifier is near-perfect! All Negative All Positive Close of Day
  23. 23. BigML, Inc 23OptiML Release Webinar Case #2: Photo Age Prediction • Suppose you want to predict the age of a printed photograph (based on dye- fade, paper watermarks, the presence and type of border, etc.) • Your training set: A few thousand photos from a few dozen people • But the age of one person’s photos are correlated in both the input and output spaces! (same age, camera, storage conditions, etc.) • So you can trivially do well predicting the age of some of one person’s photos if you know the ages of the rest • Cross-validation will tell you your classifier is near perfect!
  24. 24. BigML, Inc 24OptiML Release Webinar Take Care! • These situations are very common in all cases where data comes in batches (days, users, etc.) • The solution is to hold out whole batches of data (e.g., a specific test set) rather than just random points from each one (as in cross-validation) • It’s possible that it isn’t a problem in your dataset, but when in doubt, try both!
  25. 25. BigML, Inc 25OptiML Release Webinar In This Webinar • Technology Overview • Metric Selection • The Dangers of Naive Cross-validation • Selecting the “Best” Model • Caveat Emptor!
  26. 26. BigML, Inc 26OptiML Release Webinar Which Model is Best? • Performance isn’t the only issue! • Retraining: Will the amount of data you have be different in the future? • Fit stability: How confident must you be that the model’s behavior is invariant to small data changes? • Prediction speed: The difference can be orders of magnitude
  27. 27. BigML, Inc 27OptiML Release Webinar Modeling Tradeoffs Interpretability vs. Representability Weak vs. Slow Confidence vs. Performance Biased vs. Data-hungry Simple (Logistic) Complex (Deepnets)
  28. 28. BigML, Inc 28OptiML Release Webinar In This Webinar • Technology Overview • Metric Selection • The Dangers of Naive Cross-validation • Selecting the “Best” Model • Caveat Emptor!
  29. 29. BigML, Inc 29OptiML Release Webinar Mo’ Problems • Model selection tends to take a lot of data, and the more accurate you want the search to be, the more data you need. • We had to define a search space that would suit “most” datasets. It’s possible that the right model for your data isn’t in there!
  30. 30. BigML, Inc 30OptiML Release Webinar https://bigml.com/releases/winter-2018 Learn More https://bigml.com/whatsnew
  31. 31. Questions? @bigmlcom support@bigml.com

×