SlideShare a Scribd company logo
1 of 21
Download to read offline
Background
• Competitive data scientist at H2O.ai
• PhD in ensemble methods at UCL
• Former kaggle #1 – over 150+ competitions
H2O.ai Product Suite
Automatic feature engineering,
machine learning and interpretability
• 100% open source – Apache V2 licensed
• Built for data scientists – interface using R, Python or H2O Flow
(interactive notebook interface)
• Enterprise support subscriptions
• Fully automated machine learning
from ingest to deployment
• User licenses on a per seat basis
(annual subscription)
H2O AI open source engine
integration with Spark
Lightning fast machine
learning on GPUs
In-memory, distributed
machine learning algorithms
with H2O Flow GUI
Confidential3
Automate
Features Target
Data Quality and
Transformation Modeling Table
Model
Building
Model
Deployment
Data Integration
+
Driverless AI Workflow
visualizations
Machine Learning
Interpretability
FAST!!
Objective
Optimization
0
50
100
150
200
250
12/21/2017 12/31/2017 1/10/2018 1/20/2018 1/30/2018 2/9/2018 2/19/2018
Sales over time
0
10
20
30
40
50
60
70
80
12/21/2017 12/31/2017 1/10/2018 1/20/2018 1/30/2018 2/9/2018 2/19/2018
Sales over time
0
50
100
150
200
250
300
350
400
12/31/2017 1/2/2018 1/4/2018 1/6/2018 1/8/2018 1/10/2018 1/12/2018 1/14/2018
Sales over time
Linear relationshipNonlinear (seasonal) relationship
What is a Time Series Problem?
0
100
200
300
400
500
600
700
800
12/21/2017 12/31/2017 1/10/2018 1/20/2018 1/30/2018 2/9/2018 2/19/2018 3/1/2018 3/11/2018
sales per per day (all groups)
0
100
200
300
400
500
600
700
800
12/21/2017 12/31/2017 1/10/2018 1/20/2018 1/30/2018 2/9/2018 2/19/2018 3/1/2018 3/11/2018
sales by group
group 1 group 2 group 3
time groups sales
01/01/2018 group1 30
01/01/2018 group2 100
01/01/2018 group3 10
02/01/2018 group1 60.2
02/01/2018 group2 200.2
02/01/2018 group3 20.2
03/01/2018 group1 90.3
03/01/2018 group2 300.3
03/01/2018 group3 30.3
04/01/2018 group1 120.4
04/01/2018 group2 400.4
04/01/2018 group3 40.4
Time Groups
Modeling Foundation
1 2 3 4 5 6 7 8 9 10 11 12
[Gap]
1 2 3 4 5 6 7 8 9 10 11 12
[Gap] [Gap]
testtrain
tvs train tvs valid test
time:
Gap | Forecast Horizon
invalid lag size
valid lag size
time:
• Single time split (most recent training data becomes validation)
Validation Schemas #1
0
10000
20000
30000
40000
50000
60000
70000
2/5/20103/5/20104/5/20105/5/20106/5/20107/5/20108/5/20109/5/201010/5/201011/5/201012/5/20101/5/20112/5/20113/5/20114/5/20115/5/20116/5/20117/5/20118/5/20119/5/201110/5/201111/5/201112/5/20111/5/20122/5/20123/5/20124/5/2012
Validation Schemas #2
Rolling window with adjusting training size Rolling window with constant training size
• Multi window validation
Date
1/1/2018
2/1/2018
3/1/2018
4/1/2018
5/1/2018
6/1/2018
7/1/2018
8/1/2018
9/1/2018
10/1/2018
Day Month Year Weekday Weeknum IsHoliday
1 1 2018 2 1 1
2 1 2018 3 1 0
3 1 2018 4 1 0
4 1 2018 5 1 0
5 1 2018 6 1 0
6 1 2018 7 1 0
7 1 2018 1 2 0
8 1 2018 2 2 0
9 1 2018 3 2 0
10 1 2018 4 2 0
Feature Engineering: Decomposing the date
Feature Engineering: Lags
date target
01/01/2019 10.40
02/01/2019 10.04
03/01/2019 12.22
04/01/2019 12.74
05/01/2019 14.87
06/01/2019 15.43
07/01/2019 16.13
08/01/2019 17.20
09/01/2019 18.96
10/01/2019 19.20
11/01/2019 19.92
12/01/2019 19.31
13/01/2019 20.30
14/01/2019 21.73
15/01/2019 24.64
lag1 lag2 lag3
10.40
10.04 10.40
12.22 10.04 10.40
12.74 12.22 10.04
14.87 12.74 12.22
15.43 14.87 12.74
16.13 15.43 14.87
17.20 16.13 15.43
18.96 17.20 16.13
19.20 18.96 17.20
19.92 19.20 18.96
19.31 19.92 19.20
20.30 19.31 19.92
21.73 20.30 19.31 0.00
10.00
20.00
30.00
40.00
50.00
60.00
70.00
12/26/2018 1/5/2019 1/15/2019 1/25/2019 2/4/2019 2/14/2019 2/24/2019 3/6/2019
AxisTitle
target
0.00
10.00
20.00
30.00
40.00
50.00
60.00
70.00
12/26/2018 1/5/2019 1/15/2019 1/25/2019 2/4/2019 2/14/2019 2/24/2019 3/6/2019
AxisTitle
target
lag1
0.00
10.00
20.00
30.00
40.00
50.00
60.00
70.00
12/26/2018 1/5/2019 1/15/2019 1/25/2019 2/4/2019 2/14/2019 2/24/2019 3/6/2019
AxisTitle
target
lag1
lag2
0.00
10.00
20.00
30.00
40.00
50.00
60.00
70.00
12/26/2018 1/5/2019 1/15/2019 1/25/2019 2/4/2019 2/14/2019 2/24/2019 3/6/2019
AxisTitle
target
lag1
lag2
lag3
Feature Engineering: Windows
date target
01/01/2019 10.40
02/01/2019 10.04
03/01/2019 12.22
04/01/2019 12.74
05/01/2019 14.87
06/01/2019 15.43
07/01/2019 16.13
08/01/2019 17.20
09/01/2019 18.96
10/01/2019 19.20
11/01/2019 19.92
12/01/2019 19.31
13/01/2019 20.30
14/01/2019 21.73
15/01/2019 24.64
lag1 lag2 lag3
10.40
10.04 10.40
12.22 10.04 10.40
12.74 12.22 10.04
14.87 12.74 12.22
15.43 14.87 12.74
16.13 15.43 14.87
17.20 16.13 15.43
18.96 17.20 16.13
19.20 18.96 17.20
19.92 19.20 18.96
19.31 19.92 19.20
20.30 19.31 19.92
21.73 20.30 19.31
0.00
10.00
20.00
30.00
40.00
50.00
60.00
70.00
12/26/2018 1/5/2019 1/15/2019 1/25/2019 2/4/2019 2/14/2019 2/24/2019 3/6/2019
target
lag1
lag2
lag3
0.00
10.00
20.00
30.00
40.00
50.00
60.00
70.00
12/26/2018 1/5/2019 1/15/2019 1/25/2019 2/4/2019 2/14/2019 2/24/2019 3/6/2019
target
MA
12.22 +
10.04 +
10.40 /
(3) =
10.88
MA3
10.88
11.66
13.28
14.35
15.48
16.25
17.43
18.45
19.36
19.48
19.84
20.45
WMA
11.19
12.11
13.72
14.80
15.69
16.55
17.90
18.79
19.52
19.49
19.91
20.85
12.22 x 3 +
10.04 x 2 +
10.40 x 1 /
(3+2+1) =
11.19
0.00
10.00
20.00
30.00
40.00
50.00
60.00
70.00
12/26/2018 1/5/2019 1/15/2019 1/25/2019 2/4/2019 2/14/2019 2/24/2019 3/6/2019
target
MA
WMA
EXPSM
10.92
11.71
13.32
14.39
15.50
16.29
17.48
18.49
19.38
19.48
19.85
20.49 0.00
10.00
20.00
30.00
40.00
50.00
60.00
70.00
12/26/2018 1/5/2019 1/15/2019 1/25/2019 2/4/2019 2/14/2019 2/24/2019 3/6/2019
target
MA
WMA
EXPSM
For hyper parameter a=0.95
(12.22 x (0.95**1) + 10.04 x (0.95**2) +10.40 x (0.95**3) )
/((0.95**1) + (0.95**2) + (0.95**3) ) = 10.92
STD MAX SKEW
1.17 12.22 1.55
1.43 12.74 -1.48
1.41 14.87 1.47
1.42 15.43 -1.44
0.63 16.13 0.34
0.89 17.20 0.61
1.43 18.96 0.70
1.09 19.20 -1.64
0.50 19.92 1.30
0.39 19.92 1.58
0.50 20.30 -0.68
1.22 21.73 0.52
Other descriptive statistics
Max, min, median, Std, kurtosis, skewness..
Feature Engineering: Interractions
date target lag1 lag2 lag3
01/01/2019 10.40
02/01/2019 10.04 10.40
03/01/2019 12.22 10.04 10.40
04/01/2019 12.74 12.22 10.04 10.40
05/01/2019 14.87 12.74 12.22 10.04
06/01/2019 15.43 14.87 12.74 12.22
07/01/2019 16.13 15.43 14.87 12.74
08/01/2019 17.20 16.13 15.43 14.87
09/01/2019 18.96 17.20 16.13 15.43
10/01/2019 19.20 18.96 17.20 16.13
11/01/2019 19.92 19.20 18.96 17.20
12/01/2019 19.31 19.92 19.20 18.96
13/01/2019 20.30 19.31 19.92 19.20
14/01/2019 21.73 20.30 19.31 19.92
15/01/2019 24.64 21.73 20.30 19.31
Diff1=lag1-lag2
diff1
2.18
0.52
2.13
0.56
0.70
1.07
1.75
0.24
0.72
-0.61
1.00
1.43
Diff2=lag1-lag3
diff2
2.70
2.65
2.69
1.26
1.77
2.83
1.99
0.96
0.11
0.38
2.42
4.33
MAdiff= (Diff1+ Diff2)/2
MAdif
2.44
1.59
2.41
0.91
1.24
1.95
1.87
0.60
0.42
-0.12
1.71
2.88
Div1=lag1/lag2
div1
1.22
1.04
1.17
1.04
1.05
1.07
1.10
1.01
1.04
0.97
1.05
1.07
0.00
10.00
20.00
30.00
40.00
50.00
60.00
70.00
12/26/2018 1/5/2019 1/15/2019 1/25/2019 2/4/2019 2/14/2019 2/24/2019 3/6/2019
-12.00
-10.00
-8.00
-6.00
-4.00
-2.00
0.00
2.00
4.00
6.00
8.00
10.00
0.00
10.00
20.00
30.00
40.00
50.00
60.00
70.00
12/26/2018 1/5/2019 1/15/2019 1/25/2019 2/4/2019 2/14/2019 2/24/2019 3/6/2019
-6.00
-4.00
-2.00
0.00
2.00
4.00
6.00
8.00
10.00
0.00
10.00
20.00
30.00
40.00
50.00
60.00
70.00
12/26/2018 1/5/2019 1/15/2019 1/25/2019 2/4/2019 2/14/2019 2/24/2019 3/6/2019
-8.00
-6.00
-4.00
-2.00
0.00
2.00
4.00
6.00
8.00
10.00
0.00
10.00
20.00
30.00
40.00
50.00
60.00
70.00
12/26/2018 1/5/2019 1/15/2019 1/25/2019 2/4/2019 2/14/2019 2/24/2019 3/6/2019
Feature Engineering: trends
date target lag1 lag2 lag3
01/01/2019 10.40
02/01/2019 10.04 10.40
03/01/2019 12.22 10.04 10.40
04/01/2019 12.74 12.22 10.04 10.40
05/01/2019 14.87 12.74 12.22 10.04
06/01/2019 15.43 14.87 12.74 12.22
07/01/2019 16.13 15.43 14.87 12.74
08/01/2019 17.20 16.13 15.43 14.87
09/01/2019 18.96 17.20 16.13 15.43
10/01/2019 19.20 18.96 17.20 16.13
11/01/2019 19.92 19.20 18.96 17.20
12/01/2019 19.31 19.92 19.20 18.96
13/01/2019 20.30 19.31 19.92 19.20
14/01/2019 21.73 20.30 19.31 19.92
15/01/2019 24.64 21.73 20.30 19.31
correl
0.78
0.94
0.94
0.95
1.00
0.99
0.99
0.92
0.96
0.14
0.38
0.99
0.00
2.00
4.00
6.00
8.00
10.00
12.00
14.00
1/1/2019 1/2/2019 1/3/2019
R² = 0.6059
0.00
2.00
4.00
6.00
8.00
10.00
12.00
14.00
1/1/2019 1/2/2019 1/3/2019
Feature Engineering: Target Transformations
date target
01/01/2019 10.40
02/01/2019 10.04
03/01/2019 12.22
04/01/2019 12.74
05/01/2019 14.87
06/01/2019 15.43
07/01/2019 16.13
08/01/2019 17.20
09/01/2019 18.96
10/01/2019 19.20
11/01/2019 19.92
12/01/2019 19.31
13/01/2019 20.30
14/01/2019 21.73
15/01/2019 24.64
0.00
10.00
20.00
30.00
40.00
50.00
60.00
70.00
12/26/2018 1/5/2019 1/15/2019 1/25/2019 2/4/2019 2/14/2019 2/24/2019 3/6/2019
sqrt
3.22
3.17
3.50
3.57
3.86
3.93
4.02
4.15
4.35
4.38
4.46
4.39
4.51
4.66
4.96
log
2.34
2.31
2.50
2.54
2.70
2.74
2.78
2.85
2.94
2.95
2.99
2.96
3.01
3.08
3.20
0.00
1.00
2.00
3.00
4.00
5.00
6.00
7.00
8.00
9.00
12/26/2018 1/5/2019 1/15/2019 1/25/2019 2/4/2019 2/14/2019 2/24/2019 3/6/2019
0.00
0.50
1.00
1.50
2.00
2.50
3.00
3.50
4.00
4.50
12/26/2018 1/5/2019 1/15/2019 1/25/2019 2/4/2019 2/14/2019 2/24/2019 3/6/2019
differ
-0.36
2.18
0.52
2.13
0.56
0.70
1.07
1.75
0.24
0.72
-0.61
1.00
1.43
2.91
-12.00
-10.00
-8.00
-6.00
-4.00
-2.00
0.00
2.00
4.00
6.00
8.00
10.00
12/26/2018 1/5/2019 1/15/2019 1/25/2019 2/4/2019 2/14/2019 2/24/2019 3/6/2019
• Ranking based on autocorrelation
• Pre-defined intervals (based on estimated frequency)
Daily data
• [7, 14, 21, …]
• [14, 28, 32, …]
• …
Weekly data
• [2, 4, 6, 8, …]
• [4, 8, 12, 16, …]
• …
…
Candidates for Lag-Sizes
• Dropouts
• Random replacement of actual lag-values by „n.a.“
• Align frequency of available lag information between train and validation/test
Regularization of Lag-Features
Using top-performing Algorithms
Genetic algorithm approach
x1 x2 x3 x4 y
200 cust1 0.01 prod1 32
250 cust1 0.45 prod2 21
50 cust1 0.51 prod3 20
45 cust2 0.79 prod1 18
125 cust2 0.72 prod2 27
400 cust2 0.28 prod3 35
230 cust3 0.68 prod1 37
210 cust3 0.35 prod2 30
500 cust3 0.28 prod3 28
505 cust4 0.63 prod1 29
150 cust4 0.53 prod2 40
170 cust4 0.33 prod3 35
Iteration1/10
X% accuracy
Tuned
Iteration2/10
+ Random
Transformation
lag1(x3,x4)
Z% accuracy
Feature importances
lag1(y,x2) 1
lag1(y,x4) 0.5
lag1(x1,x2) 0.2
lag1(x1,x4) 0.001
Feature importances
lag1(y,x2,x4) 1
lag1(y,x2) 0.5
lag1(y,x4) 0.2
lag1(x1,x2) 0.15
lag1(x3,x4) 0.05
Date
01/01/2019
02/01/2019
03/01/2019
01/01/2019
02/01/2019
03/01/2019
01/01/2019
02/01/2019
03/01/2019
01/01/2019
02/01/2019
03/01/2019
MLI for Time Series
Bring Your own recipe!
• Bring in your domain knowledge and
achieve even better results.
• Add additional transformers from the
open source git repo :
https://github.com/h2oai/driverlessai-
recipes
• You can contribute too!
• They follow sklearn type of api
• Add models, scorers or transformers
The Prophet model
y(t)= g(t) + s(t) + h(t)
g(t) Piecewise linear or logistic
regressor to calculate trend
s(t) models periodic changes (e.g.
weekly/yearly seasonality)
h(t) holiday component

More Related Content

Similar to Time Series in Driverless AI by Marios Michailidis

柏瑞週報 20201008
柏瑞週報 20201008柏瑞週報 20201008
柏瑞週報 20201008Pinebridge
 
柏瑞週報 20200807
柏瑞週報 20200807柏瑞週報 20200807
柏瑞週報 20200807Pinebridge
 
柏瑞週報20190826
柏瑞週報20190826柏瑞週報20190826
柏瑞週報20190826Pinebridge
 
柏瑞週報 20200717
柏瑞週報 20200717柏瑞週報 20200717
柏瑞週報 20200717Pinebridge
 
柏瑞週報 20190308
柏瑞週報 20190308柏瑞週報 20190308
柏瑞週報 20190308Pinebridge
 
柏瑞週報 20191018
柏瑞週報 20191018柏瑞週報 20191018
柏瑞週報 20191018Pinebridge
 
Pinebridge 2019 Q1
Pinebridge 2019 Q1Pinebridge 2019 Q1
Pinebridge 2019 Q1Pinebridge
 
Customer Service Manager Introduction Chubb Sheffield
Customer Service Manager   Introduction Chubb SheffieldCustomer Service Manager   Introduction Chubb Sheffield
Customer Service Manager Introduction Chubb SheffieldSteve Birtles
 
柏瑞週報 20190215
柏瑞週報 20190215柏瑞週報 20190215
柏瑞週報 20190215Pinebridge
 
Machine Learning methods to estimate the performance of aquafarms
Machine Learning methods to estimate the performance of aquafarms Machine Learning methods to estimate the performance of aquafarms
Machine Learning methods to estimate the performance of aquafarms Blue BRIDGE
 
Hcad calculation guidelines 2016
Hcad calculation guidelines 2016Hcad calculation guidelines 2016
Hcad calculation guidelines 2016cutmytaxes
 
柏瑞週報 20200703
柏瑞週報 20200703柏瑞週報 20200703
柏瑞週報 20200703Pinebridge
 
Lean six sigma black belt project by iftakhar
Lean six sigma black belt project   by iftakharLean six sigma black belt project   by iftakhar
Lean six sigma black belt project by iftakharMohammad IFTAKHAR
 
Substation Project Case Study by Dragon Sourcing
Substation Project Case Study by Dragon SourcingSubstation Project Case Study by Dragon Sourcing
Substation Project Case Study by Dragon SourcingJohn William
 
柏瑞週報 20200814
柏瑞週報 20200814柏瑞週報 20200814
柏瑞週報 20200814Pinebridge
 
柏瑞週報20200710
柏瑞週報20200710柏瑞週報20200710
柏瑞週報20200710Pinebridge
 
柏瑞週報 20190816
柏瑞週報 20190816柏瑞週報 20190816
柏瑞週報 20190816Pinebridge
 
柏瑞週報20201016
柏瑞週報20201016柏瑞週報20201016
柏瑞週報20201016Pinebridge
 
柏瑞週報 20200306
柏瑞週報 20200306柏瑞週報 20200306
柏瑞週報 20200306Pinebridge
 
柏瑞週報 20200103
柏瑞週報 20200103柏瑞週報 20200103
柏瑞週報 20200103Pinebridge
 

Similar to Time Series in Driverless AI by Marios Michailidis (20)

柏瑞週報 20201008
柏瑞週報 20201008柏瑞週報 20201008
柏瑞週報 20201008
 
柏瑞週報 20200807
柏瑞週報 20200807柏瑞週報 20200807
柏瑞週報 20200807
 
柏瑞週報20190826
柏瑞週報20190826柏瑞週報20190826
柏瑞週報20190826
 
柏瑞週報 20200717
柏瑞週報 20200717柏瑞週報 20200717
柏瑞週報 20200717
 
柏瑞週報 20190308
柏瑞週報 20190308柏瑞週報 20190308
柏瑞週報 20190308
 
柏瑞週報 20191018
柏瑞週報 20191018柏瑞週報 20191018
柏瑞週報 20191018
 
Pinebridge 2019 Q1
Pinebridge 2019 Q1Pinebridge 2019 Q1
Pinebridge 2019 Q1
 
Customer Service Manager Introduction Chubb Sheffield
Customer Service Manager   Introduction Chubb SheffieldCustomer Service Manager   Introduction Chubb Sheffield
Customer Service Manager Introduction Chubb Sheffield
 
柏瑞週報 20190215
柏瑞週報 20190215柏瑞週報 20190215
柏瑞週報 20190215
 
Machine Learning methods to estimate the performance of aquafarms
Machine Learning methods to estimate the performance of aquafarms Machine Learning methods to estimate the performance of aquafarms
Machine Learning methods to estimate the performance of aquafarms
 
Hcad calculation guidelines 2016
Hcad calculation guidelines 2016Hcad calculation guidelines 2016
Hcad calculation guidelines 2016
 
柏瑞週報 20200703
柏瑞週報 20200703柏瑞週報 20200703
柏瑞週報 20200703
 
Lean six sigma black belt project by iftakhar
Lean six sigma black belt project   by iftakharLean six sigma black belt project   by iftakhar
Lean six sigma black belt project by iftakhar
 
Substation Project Case Study by Dragon Sourcing
Substation Project Case Study by Dragon SourcingSubstation Project Case Study by Dragon Sourcing
Substation Project Case Study by Dragon Sourcing
 
柏瑞週報 20200814
柏瑞週報 20200814柏瑞週報 20200814
柏瑞週報 20200814
 
柏瑞週報20200710
柏瑞週報20200710柏瑞週報20200710
柏瑞週報20200710
 
柏瑞週報 20190816
柏瑞週報 20190816柏瑞週報 20190816
柏瑞週報 20190816
 
柏瑞週報20201016
柏瑞週報20201016柏瑞週報20201016
柏瑞週報20201016
 
柏瑞週報 20200306
柏瑞週報 20200306柏瑞週報 20200306
柏瑞週報 20200306
 
柏瑞週報 20200103
柏瑞週報 20200103柏瑞週報 20200103
柏瑞週報 20200103
 

More from Sri Ambati

H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo Day
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo DayH2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo Day
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo DaySri Ambati
 
Generative AI Masterclass - Model Risk Management.pptx
Generative AI Masterclass - Model Risk Management.pptxGenerative AI Masterclass - Model Risk Management.pptx
Generative AI Masterclass - Model Risk Management.pptxSri Ambati
 
AI and the Future of Software Development: A Sneak Peek
AI and the Future of Software Development: A Sneak Peek AI and the Future of Software Development: A Sneak Peek
AI and the Future of Software Development: A Sneak Peek Sri Ambati
 
LLMOps: Match report from the top of the 5th
LLMOps: Match report from the top of the 5thLLMOps: Match report from the top of the 5th
LLMOps: Match report from the top of the 5thSri Ambati
 
Building, Evaluating, and Optimizing your RAG App for Production
Building, Evaluating, and Optimizing your RAG App for ProductionBuilding, Evaluating, and Optimizing your RAG App for Production
Building, Evaluating, and Optimizing your RAG App for ProductionSri Ambati
 
Building LLM Solutions using Open Source and Closed Source Solutions in Coher...
Building LLM Solutions using Open Source and Closed Source Solutions in Coher...Building LLM Solutions using Open Source and Closed Source Solutions in Coher...
Building LLM Solutions using Open Source and Closed Source Solutions in Coher...Sri Ambati
 
Risk Management for LLMs
Risk Management for LLMsRisk Management for LLMs
Risk Management for LLMsSri Ambati
 
Open-Source AI: Community is the Way
Open-Source AI: Community is the WayOpen-Source AI: Community is the Way
Open-Source AI: Community is the WaySri Ambati
 
Building Custom GenAI Apps at H2O
Building Custom GenAI Apps at H2OBuilding Custom GenAI Apps at H2O
Building Custom GenAI Apps at H2OSri Ambati
 
Applied Gen AI for the Finance Vertical
Applied Gen AI for the Finance Vertical Applied Gen AI for the Finance Vertical
Applied Gen AI for the Finance Vertical Sri Ambati
 
Cutting Edge Tricks from LLM Papers
Cutting Edge Tricks from LLM PapersCutting Edge Tricks from LLM Papers
Cutting Edge Tricks from LLM PapersSri Ambati
 
Practitioner's Guide to LLMs: Exploring Use Cases and a Glimpse Beyond Curren...
Practitioner's Guide to LLMs: Exploring Use Cases and a Glimpse Beyond Curren...Practitioner's Guide to LLMs: Exploring Use Cases and a Glimpse Beyond Curren...
Practitioner's Guide to LLMs: Exploring Use Cases and a Glimpse Beyond Curren...Sri Ambati
 
Open Source h2oGPT with Retrieval Augmented Generation (RAG), Web Search, and...
Open Source h2oGPT with Retrieval Augmented Generation (RAG), Web Search, and...Open Source h2oGPT with Retrieval Augmented Generation (RAG), Web Search, and...
Open Source h2oGPT with Retrieval Augmented Generation (RAG), Web Search, and...Sri Ambati
 
KGM Mastering Classification and Regression with LLMs: Insights from Kaggle C...
KGM Mastering Classification and Regression with LLMs: Insights from Kaggle C...KGM Mastering Classification and Regression with LLMs: Insights from Kaggle C...
KGM Mastering Classification and Regression with LLMs: Insights from Kaggle C...Sri Ambati
 
LLM Interpretability
LLM Interpretability LLM Interpretability
LLM Interpretability Sri Ambati
 
Never Reply to an Email Again
Never Reply to an Email AgainNever Reply to an Email Again
Never Reply to an Email AgainSri Ambati
 
Introducción al Aprendizaje Automatico con H2O-3 (1)
Introducción al Aprendizaje Automatico con H2O-3 (1)Introducción al Aprendizaje Automatico con H2O-3 (1)
Introducción al Aprendizaje Automatico con H2O-3 (1)Sri Ambati
 
From Rapid Prototypes to an end-to-end Model Deployment: an AI Hedge Fund Use...
From Rapid Prototypes to an end-to-end Model Deployment: an AI Hedge Fund Use...From Rapid Prototypes to an end-to-end Model Deployment: an AI Hedge Fund Use...
From Rapid Prototypes to an end-to-end Model Deployment: an AI Hedge Fund Use...Sri Ambati
 
AI Foundations Course Module 1 - Shifting to the Next Step in Your AI Transfo...
AI Foundations Course Module 1 - Shifting to the Next Step in Your AI Transfo...AI Foundations Course Module 1 - Shifting to the Next Step in Your AI Transfo...
AI Foundations Course Module 1 - Shifting to the Next Step in Your AI Transfo...Sri Ambati
 
AI Foundations Course Module 1 - An AI Transformation Journey
AI Foundations Course Module 1 - An AI Transformation JourneyAI Foundations Course Module 1 - An AI Transformation Journey
AI Foundations Course Module 1 - An AI Transformation JourneySri Ambati
 

More from Sri Ambati (20)

H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo Day
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo DayH2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo Day
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo Day
 
Generative AI Masterclass - Model Risk Management.pptx
Generative AI Masterclass - Model Risk Management.pptxGenerative AI Masterclass - Model Risk Management.pptx
Generative AI Masterclass - Model Risk Management.pptx
 
AI and the Future of Software Development: A Sneak Peek
AI and the Future of Software Development: A Sneak Peek AI and the Future of Software Development: A Sneak Peek
AI and the Future of Software Development: A Sneak Peek
 
LLMOps: Match report from the top of the 5th
LLMOps: Match report from the top of the 5thLLMOps: Match report from the top of the 5th
LLMOps: Match report from the top of the 5th
 
Building, Evaluating, and Optimizing your RAG App for Production
Building, Evaluating, and Optimizing your RAG App for ProductionBuilding, Evaluating, and Optimizing your RAG App for Production
Building, Evaluating, and Optimizing your RAG App for Production
 
Building LLM Solutions using Open Source and Closed Source Solutions in Coher...
Building LLM Solutions using Open Source and Closed Source Solutions in Coher...Building LLM Solutions using Open Source and Closed Source Solutions in Coher...
Building LLM Solutions using Open Source and Closed Source Solutions in Coher...
 
Risk Management for LLMs
Risk Management for LLMsRisk Management for LLMs
Risk Management for LLMs
 
Open-Source AI: Community is the Way
Open-Source AI: Community is the WayOpen-Source AI: Community is the Way
Open-Source AI: Community is the Way
 
Building Custom GenAI Apps at H2O
Building Custom GenAI Apps at H2OBuilding Custom GenAI Apps at H2O
Building Custom GenAI Apps at H2O
 
Applied Gen AI for the Finance Vertical
Applied Gen AI for the Finance Vertical Applied Gen AI for the Finance Vertical
Applied Gen AI for the Finance Vertical
 
Cutting Edge Tricks from LLM Papers
Cutting Edge Tricks from LLM PapersCutting Edge Tricks from LLM Papers
Cutting Edge Tricks from LLM Papers
 
Practitioner's Guide to LLMs: Exploring Use Cases and a Glimpse Beyond Curren...
Practitioner's Guide to LLMs: Exploring Use Cases and a Glimpse Beyond Curren...Practitioner's Guide to LLMs: Exploring Use Cases and a Glimpse Beyond Curren...
Practitioner's Guide to LLMs: Exploring Use Cases and a Glimpse Beyond Curren...
 
Open Source h2oGPT with Retrieval Augmented Generation (RAG), Web Search, and...
Open Source h2oGPT with Retrieval Augmented Generation (RAG), Web Search, and...Open Source h2oGPT with Retrieval Augmented Generation (RAG), Web Search, and...
Open Source h2oGPT with Retrieval Augmented Generation (RAG), Web Search, and...
 
KGM Mastering Classification and Regression with LLMs: Insights from Kaggle C...
KGM Mastering Classification and Regression with LLMs: Insights from Kaggle C...KGM Mastering Classification and Regression with LLMs: Insights from Kaggle C...
KGM Mastering Classification and Regression with LLMs: Insights from Kaggle C...
 
LLM Interpretability
LLM Interpretability LLM Interpretability
LLM Interpretability
 
Never Reply to an Email Again
Never Reply to an Email AgainNever Reply to an Email Again
Never Reply to an Email Again
 
Introducción al Aprendizaje Automatico con H2O-3 (1)
Introducción al Aprendizaje Automatico con H2O-3 (1)Introducción al Aprendizaje Automatico con H2O-3 (1)
Introducción al Aprendizaje Automatico con H2O-3 (1)
 
From Rapid Prototypes to an end-to-end Model Deployment: an AI Hedge Fund Use...
From Rapid Prototypes to an end-to-end Model Deployment: an AI Hedge Fund Use...From Rapid Prototypes to an end-to-end Model Deployment: an AI Hedge Fund Use...
From Rapid Prototypes to an end-to-end Model Deployment: an AI Hedge Fund Use...
 
AI Foundations Course Module 1 - Shifting to the Next Step in Your AI Transfo...
AI Foundations Course Module 1 - Shifting to the Next Step in Your AI Transfo...AI Foundations Course Module 1 - Shifting to the Next Step in Your AI Transfo...
AI Foundations Course Module 1 - Shifting to the Next Step in Your AI Transfo...
 
AI Foundations Course Module 1 - An AI Transformation Journey
AI Foundations Course Module 1 - An AI Transformation JourneyAI Foundations Course Module 1 - An AI Transformation Journey
AI Foundations Course Module 1 - An AI Transformation Journey
 

Recently uploaded

Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Scott Keck-Warren
 
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticsKotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticscarlostorres15106
 
Commit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyCommit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyAlfredo García Lavilla
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfAddepto
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebUiPathCommunity
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brandgvaughan
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr BaganFwdays
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 3652toLead Limited
 
Story boards and shot lists for my a level piece
Story boards and shot lists for my a level pieceStory boards and shot lists for my a level piece
Story boards and shot lists for my a level piececharlottematthew16
 
Artificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxArtificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxhariprasad279825
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Commit University
 
Vertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsVertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsMiki Katsuragi
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupFlorian Wilhelm
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Mattias Andersson
 
Training state-of-the-art general text embedding
Training state-of-the-art general text embeddingTraining state-of-the-art general text embedding
Training state-of-the-art general text embeddingZilliz
 
My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024The Digital Insurer
 
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Patryk Bandurski
 
Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Enterprise Knowledge
 
Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLScyllaDB
 
Vector Databases 101 - An introduction to the world of Vector Databases
Vector Databases 101 - An introduction to the world of Vector DatabasesVector Databases 101 - An introduction to the world of Vector Databases
Vector Databases 101 - An introduction to the world of Vector DatabasesZilliz
 

Recently uploaded (20)

Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024
 
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticsKotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
 
Commit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyCommit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easy
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdf
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio Web
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brand
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365
 
Story boards and shot lists for my a level piece
Story boards and shot lists for my a level pieceStory boards and shot lists for my a level piece
Story boards and shot lists for my a level piece
 
Artificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxArtificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptx
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!
 
Vertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsVertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering Tips
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project Setup
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?
 
Training state-of-the-art general text embedding
Training state-of-the-art general text embeddingTraining state-of-the-art general text embedding
Training state-of-the-art general text embedding
 
My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024
 
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
 
Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024
 
Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQL
 
Vector Databases 101 - An introduction to the world of Vector Databases
Vector Databases 101 - An introduction to the world of Vector DatabasesVector Databases 101 - An introduction to the world of Vector Databases
Vector Databases 101 - An introduction to the world of Vector Databases
 

Time Series in Driverless AI by Marios Michailidis

  • 1. Background • Competitive data scientist at H2O.ai • PhD in ensemble methods at UCL • Former kaggle #1 – over 150+ competitions
  • 2. H2O.ai Product Suite Automatic feature engineering, machine learning and interpretability • 100% open source – Apache V2 licensed • Built for data scientists – interface using R, Python or H2O Flow (interactive notebook interface) • Enterprise support subscriptions • Fully automated machine learning from ingest to deployment • User licenses on a per seat basis (annual subscription) H2O AI open source engine integration with Spark Lightning fast machine learning on GPUs In-memory, distributed machine learning algorithms with H2O Flow GUI
  • 3. Confidential3 Automate Features Target Data Quality and Transformation Modeling Table Model Building Model Deployment Data Integration + Driverless AI Workflow visualizations Machine Learning Interpretability FAST!! Objective Optimization
  • 4. 0 50 100 150 200 250 12/21/2017 12/31/2017 1/10/2018 1/20/2018 1/30/2018 2/9/2018 2/19/2018 Sales over time 0 10 20 30 40 50 60 70 80 12/21/2017 12/31/2017 1/10/2018 1/20/2018 1/30/2018 2/9/2018 2/19/2018 Sales over time 0 50 100 150 200 250 300 350 400 12/31/2017 1/2/2018 1/4/2018 1/6/2018 1/8/2018 1/10/2018 1/12/2018 1/14/2018 Sales over time Linear relationshipNonlinear (seasonal) relationship What is a Time Series Problem?
  • 5. 0 100 200 300 400 500 600 700 800 12/21/2017 12/31/2017 1/10/2018 1/20/2018 1/30/2018 2/9/2018 2/19/2018 3/1/2018 3/11/2018 sales per per day (all groups) 0 100 200 300 400 500 600 700 800 12/21/2017 12/31/2017 1/10/2018 1/20/2018 1/30/2018 2/9/2018 2/19/2018 3/1/2018 3/11/2018 sales by group group 1 group 2 group 3 time groups sales 01/01/2018 group1 30 01/01/2018 group2 100 01/01/2018 group3 10 02/01/2018 group1 60.2 02/01/2018 group2 200.2 02/01/2018 group3 20.2 03/01/2018 group1 90.3 03/01/2018 group2 300.3 03/01/2018 group3 30.3 04/01/2018 group1 120.4 04/01/2018 group2 400.4 04/01/2018 group3 40.4 Time Groups
  • 6. Modeling Foundation 1 2 3 4 5 6 7 8 9 10 11 12 [Gap] 1 2 3 4 5 6 7 8 9 10 11 12 [Gap] [Gap] testtrain tvs train tvs valid test time: Gap | Forecast Horizon invalid lag size valid lag size time:
  • 7. • Single time split (most recent training data becomes validation) Validation Schemas #1 0 10000 20000 30000 40000 50000 60000 70000 2/5/20103/5/20104/5/20105/5/20106/5/20107/5/20108/5/20109/5/201010/5/201011/5/201012/5/20101/5/20112/5/20113/5/20114/5/20115/5/20116/5/20117/5/20118/5/20119/5/201110/5/201111/5/201112/5/20111/5/20122/5/20123/5/20124/5/2012
  • 8. Validation Schemas #2 Rolling window with adjusting training size Rolling window with constant training size • Multi window validation
  • 9. Date 1/1/2018 2/1/2018 3/1/2018 4/1/2018 5/1/2018 6/1/2018 7/1/2018 8/1/2018 9/1/2018 10/1/2018 Day Month Year Weekday Weeknum IsHoliday 1 1 2018 2 1 1 2 1 2018 3 1 0 3 1 2018 4 1 0 4 1 2018 5 1 0 5 1 2018 6 1 0 6 1 2018 7 1 0 7 1 2018 1 2 0 8 1 2018 2 2 0 9 1 2018 3 2 0 10 1 2018 4 2 0 Feature Engineering: Decomposing the date
  • 10. Feature Engineering: Lags date target 01/01/2019 10.40 02/01/2019 10.04 03/01/2019 12.22 04/01/2019 12.74 05/01/2019 14.87 06/01/2019 15.43 07/01/2019 16.13 08/01/2019 17.20 09/01/2019 18.96 10/01/2019 19.20 11/01/2019 19.92 12/01/2019 19.31 13/01/2019 20.30 14/01/2019 21.73 15/01/2019 24.64 lag1 lag2 lag3 10.40 10.04 10.40 12.22 10.04 10.40 12.74 12.22 10.04 14.87 12.74 12.22 15.43 14.87 12.74 16.13 15.43 14.87 17.20 16.13 15.43 18.96 17.20 16.13 19.20 18.96 17.20 19.92 19.20 18.96 19.31 19.92 19.20 20.30 19.31 19.92 21.73 20.30 19.31 0.00 10.00 20.00 30.00 40.00 50.00 60.00 70.00 12/26/2018 1/5/2019 1/15/2019 1/25/2019 2/4/2019 2/14/2019 2/24/2019 3/6/2019 AxisTitle target 0.00 10.00 20.00 30.00 40.00 50.00 60.00 70.00 12/26/2018 1/5/2019 1/15/2019 1/25/2019 2/4/2019 2/14/2019 2/24/2019 3/6/2019 AxisTitle target lag1 0.00 10.00 20.00 30.00 40.00 50.00 60.00 70.00 12/26/2018 1/5/2019 1/15/2019 1/25/2019 2/4/2019 2/14/2019 2/24/2019 3/6/2019 AxisTitle target lag1 lag2 0.00 10.00 20.00 30.00 40.00 50.00 60.00 70.00 12/26/2018 1/5/2019 1/15/2019 1/25/2019 2/4/2019 2/14/2019 2/24/2019 3/6/2019 AxisTitle target lag1 lag2 lag3
  • 11. Feature Engineering: Windows date target 01/01/2019 10.40 02/01/2019 10.04 03/01/2019 12.22 04/01/2019 12.74 05/01/2019 14.87 06/01/2019 15.43 07/01/2019 16.13 08/01/2019 17.20 09/01/2019 18.96 10/01/2019 19.20 11/01/2019 19.92 12/01/2019 19.31 13/01/2019 20.30 14/01/2019 21.73 15/01/2019 24.64 lag1 lag2 lag3 10.40 10.04 10.40 12.22 10.04 10.40 12.74 12.22 10.04 14.87 12.74 12.22 15.43 14.87 12.74 16.13 15.43 14.87 17.20 16.13 15.43 18.96 17.20 16.13 19.20 18.96 17.20 19.92 19.20 18.96 19.31 19.92 19.20 20.30 19.31 19.92 21.73 20.30 19.31 0.00 10.00 20.00 30.00 40.00 50.00 60.00 70.00 12/26/2018 1/5/2019 1/15/2019 1/25/2019 2/4/2019 2/14/2019 2/24/2019 3/6/2019 target lag1 lag2 lag3 0.00 10.00 20.00 30.00 40.00 50.00 60.00 70.00 12/26/2018 1/5/2019 1/15/2019 1/25/2019 2/4/2019 2/14/2019 2/24/2019 3/6/2019 target MA 12.22 + 10.04 + 10.40 / (3) = 10.88 MA3 10.88 11.66 13.28 14.35 15.48 16.25 17.43 18.45 19.36 19.48 19.84 20.45 WMA 11.19 12.11 13.72 14.80 15.69 16.55 17.90 18.79 19.52 19.49 19.91 20.85 12.22 x 3 + 10.04 x 2 + 10.40 x 1 / (3+2+1) = 11.19 0.00 10.00 20.00 30.00 40.00 50.00 60.00 70.00 12/26/2018 1/5/2019 1/15/2019 1/25/2019 2/4/2019 2/14/2019 2/24/2019 3/6/2019 target MA WMA EXPSM 10.92 11.71 13.32 14.39 15.50 16.29 17.48 18.49 19.38 19.48 19.85 20.49 0.00 10.00 20.00 30.00 40.00 50.00 60.00 70.00 12/26/2018 1/5/2019 1/15/2019 1/25/2019 2/4/2019 2/14/2019 2/24/2019 3/6/2019 target MA WMA EXPSM For hyper parameter a=0.95 (12.22 x (0.95**1) + 10.04 x (0.95**2) +10.40 x (0.95**3) ) /((0.95**1) + (0.95**2) + (0.95**3) ) = 10.92 STD MAX SKEW 1.17 12.22 1.55 1.43 12.74 -1.48 1.41 14.87 1.47 1.42 15.43 -1.44 0.63 16.13 0.34 0.89 17.20 0.61 1.43 18.96 0.70 1.09 19.20 -1.64 0.50 19.92 1.30 0.39 19.92 1.58 0.50 20.30 -0.68 1.22 21.73 0.52 Other descriptive statistics Max, min, median, Std, kurtosis, skewness..
  • 12. Feature Engineering: Interractions date target lag1 lag2 lag3 01/01/2019 10.40 02/01/2019 10.04 10.40 03/01/2019 12.22 10.04 10.40 04/01/2019 12.74 12.22 10.04 10.40 05/01/2019 14.87 12.74 12.22 10.04 06/01/2019 15.43 14.87 12.74 12.22 07/01/2019 16.13 15.43 14.87 12.74 08/01/2019 17.20 16.13 15.43 14.87 09/01/2019 18.96 17.20 16.13 15.43 10/01/2019 19.20 18.96 17.20 16.13 11/01/2019 19.92 19.20 18.96 17.20 12/01/2019 19.31 19.92 19.20 18.96 13/01/2019 20.30 19.31 19.92 19.20 14/01/2019 21.73 20.30 19.31 19.92 15/01/2019 24.64 21.73 20.30 19.31 Diff1=lag1-lag2 diff1 2.18 0.52 2.13 0.56 0.70 1.07 1.75 0.24 0.72 -0.61 1.00 1.43 Diff2=lag1-lag3 diff2 2.70 2.65 2.69 1.26 1.77 2.83 1.99 0.96 0.11 0.38 2.42 4.33 MAdiff= (Diff1+ Diff2)/2 MAdif 2.44 1.59 2.41 0.91 1.24 1.95 1.87 0.60 0.42 -0.12 1.71 2.88 Div1=lag1/lag2 div1 1.22 1.04 1.17 1.04 1.05 1.07 1.10 1.01 1.04 0.97 1.05 1.07 0.00 10.00 20.00 30.00 40.00 50.00 60.00 70.00 12/26/2018 1/5/2019 1/15/2019 1/25/2019 2/4/2019 2/14/2019 2/24/2019 3/6/2019 -12.00 -10.00 -8.00 -6.00 -4.00 -2.00 0.00 2.00 4.00 6.00 8.00 10.00 0.00 10.00 20.00 30.00 40.00 50.00 60.00 70.00 12/26/2018 1/5/2019 1/15/2019 1/25/2019 2/4/2019 2/14/2019 2/24/2019 3/6/2019 -6.00 -4.00 -2.00 0.00 2.00 4.00 6.00 8.00 10.00 0.00 10.00 20.00 30.00 40.00 50.00 60.00 70.00 12/26/2018 1/5/2019 1/15/2019 1/25/2019 2/4/2019 2/14/2019 2/24/2019 3/6/2019 -8.00 -6.00 -4.00 -2.00 0.00 2.00 4.00 6.00 8.00 10.00 0.00 10.00 20.00 30.00 40.00 50.00 60.00 70.00 12/26/2018 1/5/2019 1/15/2019 1/25/2019 2/4/2019 2/14/2019 2/24/2019 3/6/2019
  • 13. Feature Engineering: trends date target lag1 lag2 lag3 01/01/2019 10.40 02/01/2019 10.04 10.40 03/01/2019 12.22 10.04 10.40 04/01/2019 12.74 12.22 10.04 10.40 05/01/2019 14.87 12.74 12.22 10.04 06/01/2019 15.43 14.87 12.74 12.22 07/01/2019 16.13 15.43 14.87 12.74 08/01/2019 17.20 16.13 15.43 14.87 09/01/2019 18.96 17.20 16.13 15.43 10/01/2019 19.20 18.96 17.20 16.13 11/01/2019 19.92 19.20 18.96 17.20 12/01/2019 19.31 19.92 19.20 18.96 13/01/2019 20.30 19.31 19.92 19.20 14/01/2019 21.73 20.30 19.31 19.92 15/01/2019 24.64 21.73 20.30 19.31 correl 0.78 0.94 0.94 0.95 1.00 0.99 0.99 0.92 0.96 0.14 0.38 0.99 0.00 2.00 4.00 6.00 8.00 10.00 12.00 14.00 1/1/2019 1/2/2019 1/3/2019 R² = 0.6059 0.00 2.00 4.00 6.00 8.00 10.00 12.00 14.00 1/1/2019 1/2/2019 1/3/2019
  • 14. Feature Engineering: Target Transformations date target 01/01/2019 10.40 02/01/2019 10.04 03/01/2019 12.22 04/01/2019 12.74 05/01/2019 14.87 06/01/2019 15.43 07/01/2019 16.13 08/01/2019 17.20 09/01/2019 18.96 10/01/2019 19.20 11/01/2019 19.92 12/01/2019 19.31 13/01/2019 20.30 14/01/2019 21.73 15/01/2019 24.64 0.00 10.00 20.00 30.00 40.00 50.00 60.00 70.00 12/26/2018 1/5/2019 1/15/2019 1/25/2019 2/4/2019 2/14/2019 2/24/2019 3/6/2019 sqrt 3.22 3.17 3.50 3.57 3.86 3.93 4.02 4.15 4.35 4.38 4.46 4.39 4.51 4.66 4.96 log 2.34 2.31 2.50 2.54 2.70 2.74 2.78 2.85 2.94 2.95 2.99 2.96 3.01 3.08 3.20 0.00 1.00 2.00 3.00 4.00 5.00 6.00 7.00 8.00 9.00 12/26/2018 1/5/2019 1/15/2019 1/25/2019 2/4/2019 2/14/2019 2/24/2019 3/6/2019 0.00 0.50 1.00 1.50 2.00 2.50 3.00 3.50 4.00 4.50 12/26/2018 1/5/2019 1/15/2019 1/25/2019 2/4/2019 2/14/2019 2/24/2019 3/6/2019 differ -0.36 2.18 0.52 2.13 0.56 0.70 1.07 1.75 0.24 0.72 -0.61 1.00 1.43 2.91 -12.00 -10.00 -8.00 -6.00 -4.00 -2.00 0.00 2.00 4.00 6.00 8.00 10.00 12/26/2018 1/5/2019 1/15/2019 1/25/2019 2/4/2019 2/14/2019 2/24/2019 3/6/2019
  • 15. • Ranking based on autocorrelation • Pre-defined intervals (based on estimated frequency) Daily data • [7, 14, 21, …] • [14, 28, 32, …] • … Weekly data • [2, 4, 6, 8, …] • [4, 8, 12, 16, …] • … … Candidates for Lag-Sizes
  • 16. • Dropouts • Random replacement of actual lag-values by „n.a.“ • Align frequency of available lag information between train and validation/test Regularization of Lag-Features
  • 18. Genetic algorithm approach x1 x2 x3 x4 y 200 cust1 0.01 prod1 32 250 cust1 0.45 prod2 21 50 cust1 0.51 prod3 20 45 cust2 0.79 prod1 18 125 cust2 0.72 prod2 27 400 cust2 0.28 prod3 35 230 cust3 0.68 prod1 37 210 cust3 0.35 prod2 30 500 cust3 0.28 prod3 28 505 cust4 0.63 prod1 29 150 cust4 0.53 prod2 40 170 cust4 0.33 prod3 35 Iteration1/10 X% accuracy Tuned Iteration2/10 + Random Transformation lag1(x3,x4) Z% accuracy Feature importances lag1(y,x2) 1 lag1(y,x4) 0.5 lag1(x1,x2) 0.2 lag1(x1,x4) 0.001 Feature importances lag1(y,x2,x4) 1 lag1(y,x2) 0.5 lag1(y,x4) 0.2 lag1(x1,x2) 0.15 lag1(x3,x4) 0.05 Date 01/01/2019 02/01/2019 03/01/2019 01/01/2019 02/01/2019 03/01/2019 01/01/2019 02/01/2019 03/01/2019 01/01/2019 02/01/2019 03/01/2019
  • 19. MLI for Time Series
  • 20. Bring Your own recipe! • Bring in your domain knowledge and achieve even better results. • Add additional transformers from the open source git repo : https://github.com/h2oai/driverlessai- recipes • You can contribute too! • They follow sklearn type of api • Add models, scorers or transformers
  • 21. The Prophet model y(t)= g(t) + s(t) + h(t) g(t) Piecewise linear or logistic regressor to calculate trend s(t) models periodic changes (e.g. weekly/yearly seasonality) h(t) holiday component