Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

VSSML18. Deepnets and Time Series

92 views

Published on

Deepnets and Time Series.
VSSML18: 4th edition of the Valencian Summer School in Machine Learning.

Published in: Data & Analytics
  • Be the first to comment

  • Be the first to like this

VSSML18. Deepnets and Time Series

  1. 1. Valencian Summer School in Machine Learning 4rd edition September 13-14, 2018
  2. 2. BigML, Inc 2 Deepnets and Time Series Going Further With Supervised Learning Charles Parker VP Algorithms, BigML, Inc
  3. 3. BigML, Inc 3 Deep Neural Networks
  4. 4. 4 BigML, Inc Power To The People! • Why another supervised learning algorithm? • Deep neural networks have been shown to be state of the art in several niche applications • Vision • Speech recognition • NLP • While powerful, these networks have historically been difficult for novices to train
  5. 5. 5 BigML, Inc Goals of BigML Deepnets • What BigML Deepnets are not (yet) • Convolutional networks • Recurrent networks (e.g., LSTM Networks) • These solve a particular type of sub-problem, and are carefully engineered by experts to do so • Can we bring some of the power of Deep Neural Networks to your problem, even if you have no deep learning expertise? • Let’s try to separate deep neural network myths from realities
  6. 6. 6 BigML, Inc Myth #1 Deep neural networks are the next step in evolution, destined to perfect humanity or destroy it utterly.
  7. 7. 7 BigML, Inc Some Weaknesses • Trees • Pro: Massive representational power that expands as the data gets larger; efficient search through this space • Con: Difficult to represent smooth functions and functions of many variables • Ensembles mitigate some of these difficulties • Logistic Regression • Pro: Some smooth, multivariate, functions are not a problem; fast optimization • Con: Parametric - If decision boundary is nonlinear, tough luck • Can these be mitigated?
  8. 8. 8 BigML, Inc Logistic Level Up Outputs Inputs
  9. 9. 9 BigML, Inc Logistic Level Up wi Class “a”, logistic(w, b)
  10. 10. 10 BigML, Inc Logistic Level Up Outputs Inputs Hidden layer
  11. 11. 11 BigML, Inc Logistic Level Up Class “a”, logistic(w, b) Hidden node 1, logistic(w, b)
  12. 12. 12 BigML, Inc Logistic Level Up Class “a”, logistic(w, b) Hidden node 1, logistic(w, b) n hidden nodes?
  13. 13. 13 BigML, Inc Logistic Level Up Class “a”, logistic(w, b) Hidden node 1, logistic(w, b) n hidden layers?
  14. 14. 14 BigML, Inc Logistic Level Up Class “a”, logistic(w, b) Hidden node 1, logistic(w, b)
  15. 15. 15 BigML, Inc Myth #2 Deep neural networks are great for the established marquee applications, but less interesting for general use.
  16. 16. 16 BigML, Inc Parameter Paralysis Parameter Name Possible Values Descent Algorithm Adam, RMSProp, Adagrad, Momentum, FTRL Number of hidden layers 0 - 32 Activation Function (per layer) relu, tanh, sigmoid, softplus, etc. Number of nodes (per layer) 1 - 8192 Learning Rate 0 - 1 Dropout Rate 0 - 1 Batch size 1 - 1024 Batch Normalization True, False Learn Residuals True, False Missing Numerics True, False Objective weights Weight per class . . . and that’s ignoring the parameters that are specific to the descent algorithm.
  17. 17. 17 BigML, Inc What Can We Do? • Clearly there are too many parameters to fuss with • Setting them takes significant expert knowledge • Solution: Metalearning (a good initial guess) • Solution: Network search (try a bunch)
  18. 18. 18 BigML, Inc Metalearning We trained 296,748 deep neural networks so you don’t have to!
  19. 19. 19 BigML, Inc Bayesian Parameter Optimization Model and EvaluateStructure 1 Structure 2 Structure 3 Structure 4 Structure 5 Structure 6
  20. 20. 20 BigML, Inc Bayesian Parameter Optimization Model and EvaluateStructure 1 Structure 2 Structure 3 Structure 4 Structure 5 Structure 6 0.75
  21. 21. 21 BigML, Inc Bayesian Parameter Optimization Model and EvaluateStructure 1 Structure 2 Structure 3 Structure 4 Structure 5 Structure 6 0.75 0.48
  22. 22. 22 BigML, Inc Bayesian Parameter Optimization Model and EvaluateStructure 1 Structure 2 Structure 3 Structure 4 Structure 5 Structure 6 0.75 0.48 0.91
  23. 23. 23 BigML, Inc Bayesian Parameter Optimization Structure 1 Structure 2 Structure 3 Structure 4 Structure 5 Structure 6 0.75 0.48 0.91 Machine Learning! Structure → performance Model and Evaluate
  24. 24. 24 BigML, Inc Benchmarking • The ML world is filled with crummy benchmarks • Not enough datasets • No cross-validation • Only one metric • Solution: Roll our own • 50+ datasets, 5 replications of 10-fold CV • 10 different metrics • 30+ competing algorithms (R, scikit-learn, weka, xgboost) http://www.clparker.org/ml_benchmark/
  25. 25. 25 BigML, Inc Myth #3 Deep neural networks have such spectacular performance that all other supervised learning techniques are now irrelevant
  26. 26. 26 BigML, Inc Caveat Emptor • Things that make deep learning less useful: • Small data (where that could still be thousands of instances) • Problems where you could benefit by iterating quickly (better features always beats better models) • Problems that are easy, or for which top-of-the-line performance isn’t absolutely critical • Remember deep learning is just another sort of supervised learning algorithm “…deep learning has existed in the neural network community for over 20 years. Recent advances are driven by some relatively minor improvements in algorithms and models and by the availability of large data sets and much more powerful collections of computers.” — Stuart Russell
  27. 27. BigML, Inc 27 Time Series Analysis
  28. 28. BigML, Inc 28Time Series / Deepnets Beyond IID Data • Traditional machine learning data is assumed to be IID • Independent (points have no information about each other’s class) and • Identically distributed (come from the same distribution) • But what if you want to predict just the next value in a sequence? Is all lost? • Applications • Predicting battery life from change-discharge cycles • Predicting sales for the next day/week/month
  29. 29. BigML, Inc 29Time Series / Deepnets Machine Learning Data Color Mass Type red 11 pen green 45 apple red 53 apple yellow 0 pen blue 2 pen green 422 pineapple yellow 555 pineapple blue 7 pen Discovering patterns within data: • Color = “red” Mass < 100 • Type = “pineapple” Color ≠ “blue” • Color = “blue” PPAP = “pen”
  30. 30. BigML, Inc 30Time Series / Deepnets Machine Learning Data Color Mass Type red 53 apple blue 2 pen red 11 pen blue 7 pen green 45 apple yellow 555 pineapple green 422 pineapple yellow 0 pen Patterns valid despite reshuffling • Color = “red” Mass < 100 • Type = “pineapple” Color ≠ “blue” • Color = “blue” PPAP = “pen”
  31. 31. BigML, Inc 31Time Series / Deepnets Time Series Data Year Pineapple Harvest 1986 50,74 1987 22,03 1988 50,69 1989 40,38 1990 29,80 1991 9,90 1992 73,93 1993 22,95 1994 139,09 1995 115,17 1996 193,88 1997 175,31 1998 223,41 1999 295,03 2000 450,53 Pineapple Harvest Tons 0 125 250 375 500 Year 1986 1988 1990 1992 1994 1996 1998 2000 Trend
  32. 32. BigML, Inc 32Time Series / Deepnets Time Series Data Year Pineapple Harvest 1986 139,09 1987 175,31 1988 9,91 1989 22,95 1990 450,53 1991 73,93 1992 40,38 1993 22,03 1994 295,03 1995 50,74 1996 29,8 1997 223,41 1998 115,17 1999 193,88 2000 50,69 Pineapple Harvest Tons 0 125 250 375 500 Year 1986 1988 1990 1992 1994 1996 1998 2000 Patterns invalid after shuffling
  33. 33. BigML, Inc 33Time Series / Deepnets Prediction Use the data from the past to predict the future
  34. 34. BigML, Inc 34Time Series / Deepnets Exponential Smoothing
  35. 35. BigML, Inc 35Time Series / Deepnets Exponential Smoothing Weight 0 0,05 0,1 0,15 0,2 Lag 1 3 5 7 9 11 13
  36. 36. BigML, Inc 36Time Series / Deepnets Trendy 0 12,5 25 37,5 50 Time Apr May Jun Jul y 0 50 100 150 200 Time Apr May Jun Jul Additive Multiplicative
  37. 37. BigML, Inc 37Time Series / Deepnets Seasonalityy 0 30 60 90 120 Time 1 4 7 10 13 16 19 y 0 35 70 105 140 Time 1 4 7 10 13 16 19 Additive Multiplicative
  38. 38. BigML, Inc 38Time Series / Deepnets Errory 0 150 300 450 600 Time 1 4 7 10 13 16 19 y 0 125 250 375 500 Time 1 4 7 10 13 16 19 Additive Multiplicative
  39. 39. BigML, Inc 39Time Series / Deepnets Model Types None Additive Multiplicative None A,N,N M,N,N A,N,A M,N,A A,N,M M,N,M Additive A,A,N M,A,N A,A,A M,A,A A,A,M M,A,M Additive + Damped A,Ad,N M,Ad,N A,Ad,A M,Ad,A A,Ad,M M,Ad,M Multiplicative A,M,N M,M,N A,M,A M,M,A A,M,M M,M,M Multiplicative + Damped A,Md,N M,Md,N A,Md,A M,Md,A A,Md,M M,Md,M M,N,A Multiplicative Error No Trend Additive Seasonality
  40. 40. BigML, Inc 40Time Series / Deepnets Evaluating Model Fit • AIC: Akaike Information Criterion; tries to trade off accuracy and model complexity • AICc: Like the AIC, but with a sample size correction • BIC: Bayesian Information Criterion; like the AIC but penalizes large numbers of parameters more harshly • R-squared: Raw performance, the number of model parameters isn’t considered
  41. 41. BigML, Inc 41Time Series / Deepnets Linear Splitting Year Pineapple Harvest 1986 139,09 1987 175,31 1988 9,91 1989 22,95 1990 450,53 1991 73,93 1992 40,38 1993 22,03 1994 295,03 1995 115,17 Random Split Year Pineapple Harvest 1986 139,09 1987 175,31 1988 9,91 1989 22,95 1990 450,53 1991 73,93 1992 40,38 1993 22,03 1994 295,03 1995 115,17 Linear Split

×