VSSML17 L6. Time Series and Deepnets

Valencian Summer School in Machine Learning
3rd edition
September 14-15, 2017

BigML, Inc 2Time Series / Deepnets
Time Series Analysis

Beyond Supervision
• Traditional machine learning data is assumed to
be IID
• Independent (points have no information about each
other’s class) and
• Identically distributed (come from the same distribution)
• But what if you want to predict just the next value
in a sequence? Is all lost?
• Applications
• Predicting battery life from change-discharge cycles
• Predicting sales for the next day/week/month

Machine Learning Data
Color Mass Type
red 11 pen
green 45 apple
red 53 apple
yellow 0 pen
blue 2 pen
green 422 pineapple
yellow 555 pineapple
blue 7 pen
Discovering patterns within data:
• Color = “red” Mass < 100
• Type = “pineapple” Color ≠ “blue”
• Color = “blue” PPAP = “pen”

Machine Learning Data
Color Mass Type
red 53 apple
blue 2 pen
red 11 pen
blue 7 pen
green 45 apple
yellow 555 pineapple
green 422 pineapple
yellow 0 pen
Patterns valid despite reshufﬂing
• Color = “red” Mass < 100
• Type = “pineapple” Color ≠ “blue”
• Color = “blue” PPAP = “pen”

Time Series Data
Year Pineapple Harvest
1986 50,74
1987 22,03
1988 50,69
1989 40,38
1990 29,80
1991 9,90
1992 73,93
1993 22,95
1994 139,09
1995 115,17
1996 193,88
1997 175,31
1998 223,41
1999 295,03
2000 450,53
Pineapple Harvest
Tons
0
125
250
375
500
Year
1986 1988 1990 1992 1994 1996 1998 2000
Trend

Time Series Data
1986 139,09
1987 175,31
1988 9,91
1989 22,95
1990 450,53
1991 73,93
1992 40,38
1993 22,03
1994 295,03
1995 50,74
1996 29,8
1997 223,41
1998 115,17
1999 193,88
2000 50,69
Pineapple Harvest
Tons
0
125
250
375
500
Year
1986 1988 1990 1992 1994 1996 1998 2000
Patterns invalid after shufﬂing

Prediction
Use the data from the past to predict the future

Exponential Smoothing

Exponential Smoothing
Weight 0
0,05
0,1
0,15
0,2
Lag
1 3 5 7 9 11 13

Trendy
0
12,5
25
37,5
50
Time
Apr May Jun Jul
y
0
50
100
150
200
Time
Apr May Jun Jul
Additive Multiplicative

Seasonalityy
0
30
60
90
120
Time
1 4 7 10 13 16 19
y
0
35
70
105
140
Time
1 4 7 10 13 16 19

Errory
0
150
300
450
600
Time
1 4 7 10 13 16 19
y
0
125
250
375
500
Time
1 4 7 10 13 16 19

Model Types
None Additive Multiplicative
None A,N,N M,N,N A,N,A M,N,A A,N,M M,N,M
Additive A,A,N M,A,N A,A,A M,A,A A,A,M M,A,M
Additive + Damped A,Ad,N M,Ad,N A,Ad,A M,Ad,A A,Ad,M M,Ad,M
Multiplicative A,M,N M,M,N A,M,A M,M,A A,M,M M,M,M
Multiplicative + Damped A,Md,N M,Md,N A,Md,A M,Md,A A,Md,M M,Md,M
M,N,A
Multiplicative Error
No Trend
Additive Seasonality

Evaluating Model Fit
• AIC: Akaike Information Criterion; tries to trade oﬀ
accuracy and model complexity
• AICc: Like the AIC, but with a sample size
correction
• BIC: Bayesian Information Criterion; like the AIC
but penalizes large numbers of parameters more
harshly
• R-squared: Raw performance, the number of
model parameters isn’t considered

Linear Splitting
1986 139,09
1987 175,31
1988 9,91
1989 22,95
1990 450,53
1991 73,93
1992 40,38
1993 22,03
1994 295,03
1995 115,17
Random Split
1986 139,09
1987 175,31
1988 9,91
1989 22,95
1990 450,53
1991 73,93
1992 40,38
1993 22,03
1994 295,03
1995 115,17
Linear Split

Deep Neural Networks

BigML Deepnets
• Not Done Yet!
• I’m the tech lead, so I’m the reason we don’t have a demo for
this (sorry).
• Check out our next release webinar!
• Let’s Still Have a Chat
• Deep learning is regarded in the media as some sort of strange
robot messiah, destined to either save or destroy us all
• What’s good about deep learning and why is it so popular
now?
• How much is hype and what are some of the major issues with
it?

Going Further
• Trees
• Pro: Massive representational power that expands as the data
gets larger; efficient search through this space
• Con: Difficult to represent smooth functions and functions of
many variables
• Ensembles mitigate some of these difficulties
• Logistic Regression
• Pro: Some smooth, multivariate, functions are not a problem;
fast optimization of chosen
• Con: Parametric - If decision boundary is nonlinear, tough luck
• Can these be mitigated?

LR Level Up
Outputs
Inputs

LR Level Up
wi
Class 1, logistic(w, b)

LR Level Up
Outputs
Inputs
Hidden
layer

LR Level Up
Hidden unit 1,
logistic(w, b)

LR Level Up
Hidden unit 1,
logistic(w, b)
n nodes ?

LR Level Up
Hidden unit 1,
logistic(w, b)
n
hidden
layers?

LR Level Up
Hidden unit 1,
logistic(w, b)

Why?
• This isn’t new. Why the sudden interest?
• Scale
• Massive parameter space <=> Massive data
• Abundance of compute power + GPUs
• Frameworks for computational graph composition
(TensorFlow, Theano, Torch, Caﬀe)
• “Compiles” the network architecture into a highly
optimized set of commands that run quickly and with
maximum parallelism
• Symbolically diﬀerentiates the objective for gradient
descent

Deep Networks
• Like Trees / Ensembles, we have arbitrary
representational power by modifying the structure
• Like logistic regression, smooth, multivariate
objectives aren’t a problem (provided we have the
right structure)
• So what have we lost?

Deep Network Cons
• Eﬃciency
• The right structure for given data is not easily found,
and most structures are bad
• Solution: Try a bunch of them, and be clever about
how you do it
• Interpretability
• We’ve gotten quite far away from the interpretability of
trees
• Solution: Use sampling and tree induction to create
decision tree-like explanations for predictions

Bayesian Parameter Optimization
Model and
EvaluateStructure 1
Structure 2
Structure 3
Structure 4
Structure 5
Structure 6

Model and
EvaluateStructure 1
Structure 2
Structure 3
Structure 4
Structure 5
Structure 6
0.75

Model and
EvaluateStructure 1
Structure 2
Structure 3
Structure 4
Structure 5
Structure 6
0.75
0.48

Model and
EvaluateStructure 1
Structure 2
Structure 3
Structure 4
Structure 5
Structure 6
0.75
0.48
0.91

Structure 1
Structure 2
Structure 3
Structure 4
Structure 5
Structure 6
0.75
0.48
0.91
Model!
Structure -> performance
Model and
Evaluate

Should I use it?
• Things that make deep learning less useful:
• Small data (where that could still be thousands of instances)
• Problems where you could beneﬁt by iterating quickly (better
features always beats better models)
• Problems that are easy, or for which top-of-the-line
performance isn’t absolutely critical
• Remember deep learning is just another sort
of classiﬁer
“…deep learning has existed in the neural network community for over 20 years. Recent advances are
driven by some relatively minor improvements in algorithms and models and by the availability of large
data sets and much more powerful collections of computers.” — Stuart Russell
https://people.eecs.berkeley.edu/~russell/research/future/

VSSML17 L6. Time Series and Deepnets

VSSML17 L6. Time Series and Deepnets

More Related Content

What's hot

Similar to VSSML17 L6. Time Series and Deepnets

More from BigML, Inc

Recently uploaded

VSSML17 L6. Time Series and Deepnets