Explanable models for time series with random forest

Explanable models for time series with random forest
Nathalie Vialaneix(1) & Rémi Servien
(1) nathalie.vialaneix@inrae.fr
http://www.nathalievialaneix.eu
First PhenoDyn meeting
November 29-30, 2021

Scientific question
?
−
→
Purpose: prediction of a target quantity (e.g., yield) from functional data (e.g.,
weather time series)
Statistics & ML for high throughput data integration
Nov 29-30 2021 / PhenoDyn / Nathalie Vialaneix & Rémi Servien
p. 2

Scientific question & difficulty at stake
Purpose: Improve interpretability by selecting the most predictive intervals.
p. 3

Scientific question & difficulty at stake
Purpose: Improve interpretability by selecting the most predictive intervals.
Challenge: Selection of intervals is not too hard (e.g., group Lasso) but creating the
relevant intervals (starting point, length) is hard.
Existing solutions: [Picheny et al., 2019, Grollemund et al., 2019]
p. 3

Scientific question & framework of the presentation
Here: random forest
Why?
I versatile method for prediction
I easy to use and relatively fast
I good prediction ability in general
I natural framework for interpretability (importance through OOB samples)
p. 4

What is needed to achieve that goal?
Three/four key ingredients
1. random forest for time series
2. (maybe optional) ... based on summary descriptors of intervals
3. building intervals
4. selecting intervals
p. 5

A short reminder on random forest [Breiman, 2001]
Ln
LΘ1
n LΘ`
n L
Θq
n
b
h(., Θ1, Θ0
1) b
h(., Θ`, Θ0
`) b
h(., Θq, Θ0
q)
b
hRF−RI (.)
Bootstrap
Arbre RI
Agrégation
Courtesy of Robin Genuer and Jean-Michel Poggi.
p. 6

CART [Breiman et al., 1984]
C8 C9
C4 C5
C2 C3
C1
X1 ≤ d3 X1 > d3
X2 ≤ d2 X2 > d2
X1 ≤ d1 X1 > d1
d3 d1
d2
X1
X2
C4
C3
C8 C9
p. 7

Technical details
I splits are made by randomly choosing mtry < d variables randomly and by finding
the “best split” among this selection
I aggregation: average (target is numeric) or maximum vote rule (target is a class)
p. 8

Technical details
I splits are made by randomly choosing mtry < d variables randomly and by finding
the “best split” among this selection
I aggregation: average (target is numeric) or maximum vote rule (target is a class)
I OOB error: average error (over trees) on samples not included in the bootstrap of
the tree
I variable importance: the larger the increase in error, the most important the
variable is (based on random permutation)
p. 8

Extensions of random forest for time series
I Similarity based techniques
I Fréchet forest [Capitaine et al., 2020]
I Proximity forest [Lucas et al., 2019] (restricting to classification)
Image by courtesy of Charlotte Pelletier
p. 9

I Interval based techniques
I Time Series Forest [Deng et al., 2013] and its extension [Middlehurst et al., 2020]
I RISE [Lines et al., 2018] (a tree = a randomly selected interval)
p. 9

I Interval based techniques
I Time Series Forest [Deng et al., 2013] and its extension [Middlehurst et al., 2020]
I RISE [Lines et al., 2018] (a tree = a randomly selected interval)
I Dictionnary or symbolic representation based techniques:
I TS-CHIEF [Shifaz et al., 2020] (combines all types of splits including dictionnary
based splits based on work of [Schäfer, 2015])
I (multivariate time series) symbolic representation of time series
[Baydogan and Runger, 2015] More on that
p. 9

Time Series Forest
Basic principles:
1. for a given tree: random sampling of intervals
2. for a given tree: compute summaries (mean, sd, slope for [Deng et al., 2013] and
catch22 for [Middlehurst et al., 2020])
3. define splits as usual based on these summaries
p. 10

Time Series Forest
Basic principles:
1. for a given tree: random sampling of intervals
2. for a given tree: compute summaries (mean, sd, slope for [Deng et al., 2013] and
catch22 for [Middlehurst et al., 2020])
3. define splits as usual based on these summaries
What is useful for our question?
I combined with variable selection could help identify important intervals (still to be
tested)
I ideas to summarize information of an entire intervals (already partially tested)
p. 10

Extensions on summaries
I supervised: accounting for Y (is that useful?): PLS, linear models (including
ridge)... similar to [Poterie et al., 2019] (grouped variables summarized by LDA)
and to [Rainforth and Wood, 2017] (CCA based splits)
p. 11

I unsupervised: first PC of PCA, as in ClustOfVar method [Chavent et al., 2012],
as in [Chavent et al., 2021] (can also be useful to build groups)
p. 11

I unsupervised: first PC of PCA, as in ClustOfVar method [Chavent et al., 2012],
as in [Chavent et al., 2021] (can also be useful to build groups)
I Could a step further be taken by using oblique splits [Bertsimas and Dunn, 2017]
(let the forest decides how to combine to find the best split)?
See also: [Hornung and Boulesteix, 2021]
p. 11

Strategies for building intervals
I Precomputing intervals independantly of Y (based on correlation between time
points): constrained extension of ClustOfVar (PCA like criterion), adjclust
(contrainstred clustering based on correlation between variables) ⇒ hierarchy of
intervals
p. 12

intervals
I Precomputing intervals independantly of Y (based on greedy agglomeration):
alterning between
I regression based step (LM) between any two consecutive variables to select the best
merge (minimum loss or maximum gain in accuracy)
I summary (depends on regression type)
⇒ hierarchy of intervals
p. 12

intervals
I Precomputing intervals independantly of Y (based on greedy agglomeration):
alterning between
I regression based step (LM) between any two consecutive variables to select the best
merge (minimum loss or maximum gain in accuracy)
I summary (depends on regression type)
⇒ hierarchy of intervals
I Using random forest to compute a hierarchy of interval based on loss in grouped
importance in a greedy manner [Gregorutti et al., 2015]
p. 12

Strategies for selecting variables (here, intervals) in RF
Huge litterature...... a few reviews: [Degenhardt et al., 2019, Speiser et al., 2019]
p. 13

I based on importance:
I just use the importance...
I ranking with importance then selection of the best model with the first k variables
(K = 1, . . . , K) VSURF [Genuer et al., 2010]
I [Altmann et al., 2010] or [Szymczak et al., 2016] based on data-driven importance
threshold (untested)
p. 13

I based on importance:
I just use the importance...
I ranking with importance then selection of the best model with the first k variables
(K = 1, . . . , K) VSURF [Genuer et al., 2010]
I [Altmann et al., 2010] or [Szymczak et al., 2016] based on data-driven importance
threshold (untested)
I based on external variable selection methods:
I Knockoffs [Barber and Candès, 2015], as in Boruta [Kursa and Rudnicki, 2010]
I Relief [Robnik-Šikonja and Kononenko, 2003]
p. 13

Simulation setting
I predictors: 1,000 EMS time series (length: 444)
I important intervals
I target: yi = log(1 + |hxi , βi|) + with
β(t) = 4 × 1t∈[320,410] + 2 × 1t∈[500,550] − 1t∈[680,730]
Statistics ML for high throughput data integration
Nov 29-30 2021 / PhenoDyn / Nathalie Vialaneix Rémi Servien
p. 14

Evaluated scenarios
Scenario 1: pre-computed groups, summary, RF and importance based evaluation
Courtesy of Louisa Villa.
p. 15

Evaluated scenarios
Scenario 2: pre-computed groups, summary, selection and RF
p. 16

Evaluated scenarios
Scenario 3: groups computed in interaction with importance or variable selection (not
explained), summary, selection (or not) and RF
p. 17

Evaluation criteria
Resemblance of important/selected intervals with ground truth
Accuracy
p. 18

A few take home messages (to be confirmed)
I pre-computed groups based on correlation (especially adjclust) are better
I PLS is best in terms of summary strategy
I combining selection strategy with RF is computationnaly extensive and unefficient
I overall, the recovery of groups is a bit disappointing
p. 19

adjclust + PLS + Boruta (scenario 2)
⇒ Model selection (or model aggregation?) seem critical...
p. 20

To be continued...
p. 21

References
Altmann, A., Tolosi, L., Sander, O., and Lengauer, T. (2010).
Permutation importance: a corrected feature importance measure.
Bioinformatics, 26(10):1340–1347.
Barber, R. F. and Candès, E. (2015).
Controlling the false discovery rate via knockoffs.
Annals of Statistics, 43(5):2055–2085.
Baydogan, M. G. and Runger, G. (2015).
Learning a symbolic representation for multivariate time series classification.
Data Mining and Knowledge Discovery, 29:400–422.
Bertsimas, D. and Dunn, J. (2017).
Optimal classification trees.
Machine Learning, 106(7):1039–1082.
Breiman, L. (2001).
Random forests.
p. 21

Machine Learning, 45(1):5–32.
Breiman, L., Friedman, J., Olsen, R., and Stone, C. (1984).
Classification and Regression Trees.
Chapman and Hall, Boca Raton, Florida, USA.
Capitaine, L., Bigot, J., Thiébaut, R., and Genuer, R. (2020).
Fréchet random forests for metric space valued regression with non Euclidean predictors.
Preprint arXiv:1906.01741v2.
Chavent, M., Genuer, R., and Saracco, J. (2021).
Combining clustering of variables and feature selection using random forests.
Communications in Statistics - Simulation and Computation, 50(2):426–445.
Chavent, M., Liquet, B., Kuentz-Simonet, V., and Saracco, J. (2012).
ClustOfVar: an R package for the clustering of variables.
Journal of Statistical Software, 50(13):1–16.
Degenhardt, F., Seifert, S., and Szymczak, S. (2019).
Evaluation of variable selection methods for random forests and omics data sets.
p. 21

Briefings in Bioinformatics, 20(2):492–503.
Deng, H., Runger, G., Tuv, E., and Martyanov, V. (2013).
A time series forest for classification and feature extraction.
Information Science, 239:142–153.
Genuer, R., Poggi, J.-M., and Tuleau-Malot, C. (2010).
Variable selection using random forests.
Pattern Recognition Letters, 31(14):2225–2236.
Gregorutti, B., Michel, B., and Saint-Pierre, P. (2015).
Grouped variable importance with random forests and application to multiple functional data
analysis.
Computational Statistics and Data Analysis, 90:15–35.
Grollemund, P.-M., Abraham, C., Baragatti, M., and Pudlo, P. (2019).
Bayesian functional linear regression with sparse step functions.
Bayesian Analysis, 14(1):111–135.
Hornung, R. and Boulesteix, A.-L. (2021).
p. 21

Interaction forests: identifying and exploiting interpretable quantitative and qualitative interaction
effects.
Technical Report Number 237, Department of Statistics, University of Munich, Germany.
Kursa, M. and Rudnicki, W. (2010).
Feature selection with the Boruta package.
Journal of Statistical Software, 36(11):1–13.
Lines, J., Taylor, S., and Bagnall, A. (2018).
Time series classification with HIVE-COTE: the hierarchical vote collective of
transformation-based ensembles.
ACM Transactions on Knowledge Discovery from Data, 12(5):1–35.
Lucas, B., Shifaz, A., Pelletier, C., O’Neill, L., Zaidi, N., Goethals, B., Petitjean, F., and Webb,
G. I. (2019).
Proximity forest: an effective and scalable distance based classifier for time series.
Middlehurst, M., Large, J., and Bagnall, A. (2020).
p. 21

The canonical interval forest (CIF) classifier for time series classification.
In Wu, X., Jermaine, C., Hu, X., Kotevskia, O., Lu, S., Xu, W., Aluru, S., Zhai, C., Al-Masri, E.,
Chen, Z., and Saltz, J., editors, Proceedings of IEEE International Conference on Big Data,
Atlanta, GA, USA. IEEE.
Picheny, V., Servien, R., and Villa-Vialaneix, N. (2019).
Interpretable sparse sliced inverse regression for functional data.
Statistics and Computing, 29(2):255–267.
Poterie, A., Dupuy, J.-F., Monbet, V., and Rouvière, L. (2019).
Classification tree algorithm for grouped variables.
Computational Statistics, 34:1613–1648.
Rainforth, T. and Wood, F. (2017).
Canonical correlation forests.
arXiv: 1507.05444.
Robnik-Šikonja, M. and Kononenko, I. (2003).
Theoretical and empirical analysis of ReliefF and RReliefF.
Machine Learning, 53(1-2):23–69.
p. 21

Schäfer, P. (2015).
The BOSS is concerned with time series classification in the presence of noise.
Data Mining and Knowledge Discovery, 29(6):1505–1530.
Shifaz, A., Pelletier, C., Petitjean, F., and Webb, G. I. (2020).
TS-CHIEF: a scalable and accurate forest algorithm for time series classification.
Speiser, J. L., Miller, M. E., Tooze, J., and Ip, E. (2019).
A comparison of random forest variable selection methods for classification prediction modeling.
Expert Systems with Applications, 134:93–101.
Szymczak, S., Holzinger, E., Dasgupta, A., Malley, J., Molloy, A., Mills, J., Brody, L.,
Stambolian, D., and Bailey-Wilson, J. (2016).
r2VIM: a new variable selection method for random forests in genome-wide association studies.
BioData Mining, 9:7.
p. 22

Dictionnary/symbolic representation based
BOSS [Schäfer, 2015] and [Baydogan and Runger, 2015]
Based on: Fourier transform then symbolic representation.
[Baydogan and Runger, 2015] is similar, except that representation loses interval
information (based on a tree at time step level)
p. 22

Dictionnary/symbolic representation based
BOSS [Schäfer, 2015] and [Baydogan and Runger, 2015]
What is useful for our question? Uncertain... can symbolic representation itself be used
to represent/select (windowed) intervals? (untested) Back
p. 22

Explanable models for time series with random forest

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to Explanable models for time series with random forest

Similar to Explanable models for time series with random forest (20)

More from tuxette

More from tuxette (18)

Recently uploaded

Recently uploaded (20)

Explanable models for time series with random forest