SlideShare a Scribd company logo
Explanable models for time series with random forest
Nathalie Vialaneix(1) & Rémi Servien
(1) nathalie.vialaneix@inrae.fr
http://www.nathalievialaneix.eu
First PhenoDyn meeting
November 29-30, 2021
Scientific question
?
−
→
Purpose: prediction of a target quantity (e.g., yield) from functional data (e.g.,
weather time series)
Statistics & ML for high throughput data integration
Nov 29-30 2021 / PhenoDyn / Nathalie Vialaneix & Rémi Servien
p. 2
Scientific question & difficulty at stake
Purpose: Improve interpretability by selecting the most predictive intervals.
Statistics & ML for high throughput data integration
Nov 29-30 2021 / PhenoDyn / Nathalie Vialaneix & Rémi Servien
p. 3
Scientific question & difficulty at stake
Purpose: Improve interpretability by selecting the most predictive intervals.
Challenge: Selection of intervals is not too hard (e.g., group Lasso) but creating the
relevant intervals (starting point, length) is hard.
Existing solutions: [Picheny et al., 2019, Grollemund et al., 2019]
Statistics & ML for high throughput data integration
Nov 29-30 2021 / PhenoDyn / Nathalie Vialaneix & Rémi Servien
p. 3
Scientific question & framework of the presentation
Here: random forest
Why?
I versatile method for prediction
I easy to use and relatively fast
I good prediction ability in general
I natural framework for interpretability (importance through OOB samples)
Statistics & ML for high throughput data integration
Nov 29-30 2021 / PhenoDyn / Nathalie Vialaneix & Rémi Servien
p. 4
What is needed to achieve that goal?
Three/four key ingredients
1. random forest for time series
2. (maybe optional) ... based on summary descriptors of intervals
3. building intervals
4. selecting intervals
Statistics & ML for high throughput data integration
Nov 29-30 2021 / PhenoDyn / Nathalie Vialaneix & Rémi Servien
p. 5
A short reminder on random forest [Breiman, 2001]
Ln
LΘ1
n LΘ`
n L
Θq
n
b
h(., Θ1, Θ0
1) b
h(., Θ`, Θ0
`) b
h(., Θq, Θ0
q)
b
hRF−RI (.)
Bootstrap
Arbre RI
Agrégation
Courtesy of Robin Genuer and Jean-Michel Poggi.
Statistics & ML for high throughput data integration
Nov 29-30 2021 / PhenoDyn / Nathalie Vialaneix & Rémi Servien
p. 6
CART [Breiman et al., 1984]
C8 C9
C4 C5
C2 C3
C1
X1 ≤ d3 X1 > d3
X2 ≤ d2 X2 > d2
X1 ≤ d1 X1 > d1
d3 d1
d2
X1
X2
C4
C3
C8 C9
Statistics & ML for high throughput data integration
Nov 29-30 2021 / PhenoDyn / Nathalie Vialaneix & Rémi Servien
p. 7
Technical details
I splits are made by randomly choosing mtry < d variables randomly and by finding
the “best split” among this selection
I aggregation: average (target is numeric) or maximum vote rule (target is a class)
Statistics & ML for high throughput data integration
Nov 29-30 2021 / PhenoDyn / Nathalie Vialaneix & Rémi Servien
p. 8
Technical details
I splits are made by randomly choosing mtry < d variables randomly and by finding
the “best split” among this selection
I aggregation: average (target is numeric) or maximum vote rule (target is a class)
I OOB error: average error (over trees) on samples not included in the bootstrap of
the tree
I variable importance: the larger the increase in error, the most important the
variable is (based on random permutation)
Statistics & ML for high throughput data integration
Nov 29-30 2021 / PhenoDyn / Nathalie Vialaneix & Rémi Servien
p. 8
Extensions of random forest for time series
I Similarity based techniques
I Fréchet forest [Capitaine et al., 2020]
I Proximity forest [Lucas et al., 2019] (restricting to classification)
Image by courtesy of Charlotte Pelletier
Statistics & ML for high throughput data integration
Nov 29-30 2021 / PhenoDyn / Nathalie Vialaneix & Rémi Servien
p. 9
Extensions of random forest for time series
I Similarity based techniques
I Fréchet forest [Capitaine et al., 2020]
I Proximity forest [Lucas et al., 2019] (restricting to classification)
I Interval based techniques
I Time Series Forest [Deng et al., 2013] and its extension [Middlehurst et al., 2020]
I RISE [Lines et al., 2018] (a tree = a randomly selected interval)
Statistics & ML for high throughput data integration
Nov 29-30 2021 / PhenoDyn / Nathalie Vialaneix & Rémi Servien
p. 9
Extensions of random forest for time series
I Similarity based techniques
I Fréchet forest [Capitaine et al., 2020]
I Proximity forest [Lucas et al., 2019] (restricting to classification)
I Interval based techniques
I Time Series Forest [Deng et al., 2013] and its extension [Middlehurst et al., 2020]
I RISE [Lines et al., 2018] (a tree = a randomly selected interval)
I Dictionnary or symbolic representation based techniques:
I TS-CHIEF [Shifaz et al., 2020] (combines all types of splits including dictionnary
based splits based on work of [Schäfer, 2015])
I (multivariate time series) symbolic representation of time series
[Baydogan and Runger, 2015] More on that
Statistics & ML for high throughput data integration
Nov 29-30 2021 / PhenoDyn / Nathalie Vialaneix & Rémi Servien
p. 9
Time Series Forest
Basic principles:
1. for a given tree: random sampling of intervals
2. for a given tree: compute summaries (mean, sd, slope for [Deng et al., 2013] and
catch22 for [Middlehurst et al., 2020])
3. define splits as usual based on these summaries
Statistics & ML for high throughput data integration
Nov 29-30 2021 / PhenoDyn / Nathalie Vialaneix & Rémi Servien
p. 10
Time Series Forest
Basic principles:
1. for a given tree: random sampling of intervals
2. for a given tree: compute summaries (mean, sd, slope for [Deng et al., 2013] and
catch22 for [Middlehurst et al., 2020])
3. define splits as usual based on these summaries
What is useful for our question?
I combined with variable selection could help identify important intervals (still to be
tested)
I ideas to summarize information of an entire intervals (already partially tested)
Statistics & ML for high throughput data integration
Nov 29-30 2021 / PhenoDyn / Nathalie Vialaneix & Rémi Servien
p. 10
Extensions on summaries
I supervised: accounting for Y (is that useful?): PLS, linear models (including
ridge)... similar to [Poterie et al., 2019] (grouped variables summarized by LDA)
and to [Rainforth and Wood, 2017] (CCA based splits)
Statistics & ML for high throughput data integration
Nov 29-30 2021 / PhenoDyn / Nathalie Vialaneix & Rémi Servien
p. 11
Extensions on summaries
I supervised: accounting for Y (is that useful?): PLS, linear models (including
ridge)... similar to [Poterie et al., 2019] (grouped variables summarized by LDA)
and to [Rainforth and Wood, 2017] (CCA based splits)
I unsupervised: first PC of PCA, as in ClustOfVar method [Chavent et al., 2012],
as in [Chavent et al., 2021] (can also be useful to build groups)
Statistics & ML for high throughput data integration
Nov 29-30 2021 / PhenoDyn / Nathalie Vialaneix & Rémi Servien
p. 11
Extensions on summaries
I supervised: accounting for Y (is that useful?): PLS, linear models (including
ridge)... similar to [Poterie et al., 2019] (grouped variables summarized by LDA)
and to [Rainforth and Wood, 2017] (CCA based splits)
I unsupervised: first PC of PCA, as in ClustOfVar method [Chavent et al., 2012],
as in [Chavent et al., 2021] (can also be useful to build groups)
I Could a step further be taken by using oblique splits [Bertsimas and Dunn, 2017]
(let the forest decides how to combine to find the best split)?
See also: [Hornung and Boulesteix, 2021]
Statistics & ML for high throughput data integration
Nov 29-30 2021 / PhenoDyn / Nathalie Vialaneix & Rémi Servien
p. 11
Strategies for building intervals
I Precomputing intervals independantly of Y (based on correlation between time
points): constrained extension of ClustOfVar (PCA like criterion), adjclust
(contrainstred clustering based on correlation between variables) ⇒ hierarchy of
intervals
Statistics & ML for high throughput data integration
Nov 29-30 2021 / PhenoDyn / Nathalie Vialaneix & Rémi Servien
p. 12
Strategies for building intervals
I Precomputing intervals independantly of Y (based on correlation between time
points): constrained extension of ClustOfVar (PCA like criterion), adjclust
(contrainstred clustering based on correlation between variables) ⇒ hierarchy of
intervals
I Precomputing intervals independantly of Y (based on greedy agglomeration):
alterning between
I regression based step (LM) between any two consecutive variables to select the best
merge (minimum loss or maximum gain in accuracy)
I summary (depends on regression type)
⇒ hierarchy of intervals
Statistics & ML for high throughput data integration
Nov 29-30 2021 / PhenoDyn / Nathalie Vialaneix & Rémi Servien
p. 12
Strategies for building intervals
I Precomputing intervals independantly of Y (based on correlation between time
points): constrained extension of ClustOfVar (PCA like criterion), adjclust
(contrainstred clustering based on correlation between variables) ⇒ hierarchy of
intervals
I Precomputing intervals independantly of Y (based on greedy agglomeration):
alterning between
I regression based step (LM) between any two consecutive variables to select the best
merge (minimum loss or maximum gain in accuracy)
I summary (depends on regression type)
⇒ hierarchy of intervals
I Using random forest to compute a hierarchy of interval based on loss in grouped
importance in a greedy manner [Gregorutti et al., 2015]
Statistics & ML for high throughput data integration
Nov 29-30 2021 / PhenoDyn / Nathalie Vialaneix & Rémi Servien
p. 12
Strategies for selecting variables (here, intervals) in RF
Huge litterature...... a few reviews: [Degenhardt et al., 2019, Speiser et al., 2019]
Statistics & ML for high throughput data integration
Nov 29-30 2021 / PhenoDyn / Nathalie Vialaneix & Rémi Servien
p. 13
Strategies for selecting variables (here, intervals) in RF
Huge litterature...... a few reviews: [Degenhardt et al., 2019, Speiser et al., 2019]
I based on importance:
I just use the importance...
I ranking with importance then selection of the best model with the first k variables
(K = 1, . . . , K) VSURF [Genuer et al., 2010]
I [Altmann et al., 2010] or [Szymczak et al., 2016] based on data-driven importance
threshold (untested)
Statistics & ML for high throughput data integration
Nov 29-30 2021 / PhenoDyn / Nathalie Vialaneix & Rémi Servien
p. 13
Strategies for selecting variables (here, intervals) in RF
Huge litterature...... a few reviews: [Degenhardt et al., 2019, Speiser et al., 2019]
I based on importance:
I just use the importance...
I ranking with importance then selection of the best model with the first k variables
(K = 1, . . . , K) VSURF [Genuer et al., 2010]
I [Altmann et al., 2010] or [Szymczak et al., 2016] based on data-driven importance
threshold (untested)
I based on external variable selection methods:
I Knockoffs [Barber and Candès, 2015], as in Boruta [Kursa and Rudnicki, 2010]
I Relief [Robnik-Šikonja and Kononenko, 2003]
Statistics & ML for high throughput data integration
Nov 29-30 2021 / PhenoDyn / Nathalie Vialaneix & Rémi Servien
p. 13
Simulation setting
I predictors: 1,000 EMS time series (length: 444)
I important intervals
I target: yi = log(1 + |hxi , βi|) +  with
β(t) = 4 × 1t∈[320,410] + 2 × 1t∈[500,550] − 1t∈[680,730]
Statistics  ML for high throughput data integration
Nov 29-30 2021 / PhenoDyn / Nathalie Vialaneix  Rémi Servien
p. 14
Evaluated scenarios
Scenario 1: pre-computed groups, summary, RF and importance based evaluation
Courtesy of Louisa Villa.
Statistics  ML for high throughput data integration
Nov 29-30 2021 / PhenoDyn / Nathalie Vialaneix  Rémi Servien
p. 15
Evaluated scenarios
Scenario 2: pre-computed groups, summary, selection and RF
Courtesy of Louisa Villa.
Statistics  ML for high throughput data integration
Nov 29-30 2021 / PhenoDyn / Nathalie Vialaneix  Rémi Servien
p. 16
Evaluated scenarios
Scenario 3: groups computed in interaction with importance or variable selection (not
explained), summary, selection (or not) and RF
Courtesy of Louisa Villa.
Statistics  ML for high throughput data integration
Nov 29-30 2021 / PhenoDyn / Nathalie Vialaneix  Rémi Servien
p. 17
Evaluation criteria
Resemblance of important/selected intervals with ground truth
Accuracy
Statistics  ML for high throughput data integration
Nov 29-30 2021 / PhenoDyn / Nathalie Vialaneix  Rémi Servien
p. 18
A few take home messages (to be confirmed)
I pre-computed groups based on correlation (especially adjclust) are better
I PLS is best in terms of summary strategy
I combining selection strategy with RF is computationnaly extensive and unefficient
I overall, the recovery of groups is a bit disappointing
Statistics  ML for high throughput data integration
Nov 29-30 2021 / PhenoDyn / Nathalie Vialaneix  Rémi Servien
p. 19
adjclust + PLS + Boruta (scenario 2)
⇒ Model selection (or model aggregation?) seem critical...
Statistics  ML for high throughput data integration
Nov 29-30 2021 / PhenoDyn / Nathalie Vialaneix  Rémi Servien
p. 20
To be continued...
Statistics  ML for high throughput data integration
Nov 29-30 2021 / PhenoDyn / Nathalie Vialaneix  Rémi Servien
p. 21
References
Altmann, A., Tolosi, L., Sander, O., and Lengauer, T. (2010).
Permutation importance: a corrected feature importance measure.
Bioinformatics, 26(10):1340–1347.
Barber, R. F. and Candès, E. (2015).
Controlling the false discovery rate via knockoffs.
Annals of Statistics, 43(5):2055–2085.
Baydogan, M. G. and Runger, G. (2015).
Learning a symbolic representation for multivariate time series classification.
Data Mining and Knowledge Discovery, 29:400–422.
Bertsimas, D. and Dunn, J. (2017).
Optimal classification trees.
Machine Learning, 106(7):1039–1082.
Breiman, L. (2001).
Random forests.
Statistics  ML for high throughput data integration
Nov 29-30 2021 / PhenoDyn / Nathalie Vialaneix  Rémi Servien
p. 21
Machine Learning, 45(1):5–32.
Breiman, L., Friedman, J., Olsen, R., and Stone, C. (1984).
Classification and Regression Trees.
Chapman and Hall, Boca Raton, Florida, USA.
Capitaine, L., Bigot, J., Thiébaut, R., and Genuer, R. (2020).
Fréchet random forests for metric space valued regression with non Euclidean predictors.
Preprint arXiv:1906.01741v2.
Chavent, M., Genuer, R., and Saracco, J. (2021).
Combining clustering of variables and feature selection using random forests.
Communications in Statistics - Simulation and Computation, 50(2):426–445.
Chavent, M., Liquet, B., Kuentz-Simonet, V., and Saracco, J. (2012).
ClustOfVar: an R package for the clustering of variables.
Journal of Statistical Software, 50(13):1–16.
Degenhardt, F., Seifert, S., and Szymczak, S. (2019).
Evaluation of variable selection methods for random forests and omics data sets.
Statistics  ML for high throughput data integration
Nov 29-30 2021 / PhenoDyn / Nathalie Vialaneix  Rémi Servien
p. 21
Briefings in Bioinformatics, 20(2):492–503.
Deng, H., Runger, G., Tuv, E., and Martyanov, V. (2013).
A time series forest for classification and feature extraction.
Information Science, 239:142–153.
Genuer, R., Poggi, J.-M., and Tuleau-Malot, C. (2010).
Variable selection using random forests.
Pattern Recognition Letters, 31(14):2225–2236.
Gregorutti, B., Michel, B., and Saint-Pierre, P. (2015).
Grouped variable importance with random forests and application to multiple functional data
analysis.
Computational Statistics and Data Analysis, 90:15–35.
Grollemund, P.-M., Abraham, C., Baragatti, M., and Pudlo, P. (2019).
Bayesian functional linear regression with sparse step functions.
Bayesian Analysis, 14(1):111–135.
Hornung, R. and Boulesteix, A.-L. (2021).
Statistics  ML for high throughput data integration
Nov 29-30 2021 / PhenoDyn / Nathalie Vialaneix  Rémi Servien
p. 21
Interaction forests: identifying and exploiting interpretable quantitative and qualitative interaction
effects.
Technical Report Number 237, Department of Statistics, University of Munich, Germany.
Kursa, M. and Rudnicki, W. (2010).
Feature selection with the Boruta package.
Journal of Statistical Software, 36(11):1–13.
Lines, J., Taylor, S., and Bagnall, A. (2018).
Time series classification with HIVE-COTE: the hierarchical vote collective of
transformation-based ensembles.
ACM Transactions on Knowledge Discovery from Data, 12(5):1–35.
Lucas, B., Shifaz, A., Pelletier, C., O’Neill, L., Zaidi, N., Goethals, B., Petitjean, F., and Webb,
G. I. (2019).
Proximity forest: an effective and scalable distance based classifier for time series.
Data Mining and Knowledge Discovery, 33:607–635.
Middlehurst, M., Large, J., and Bagnall, A. (2020).
Statistics  ML for high throughput data integration
Nov 29-30 2021 / PhenoDyn / Nathalie Vialaneix  Rémi Servien
p. 21
The canonical interval forest (CIF) classifier for time series classification.
In Wu, X., Jermaine, C., Hu, X., Kotevskia, O., Lu, S., Xu, W., Aluru, S., Zhai, C., Al-Masri, E.,
Chen, Z., and Saltz, J., editors, Proceedings of IEEE International Conference on Big Data,
Atlanta, GA, USA. IEEE.
Picheny, V., Servien, R., and Villa-Vialaneix, N. (2019).
Interpretable sparse sliced inverse regression for functional data.
Statistics and Computing, 29(2):255–267.
Poterie, A., Dupuy, J.-F., Monbet, V., and Rouvière, L. (2019).
Classification tree algorithm for grouped variables.
Computational Statistics, 34:1613–1648.
Rainforth, T. and Wood, F. (2017).
Canonical correlation forests.
arXiv: 1507.05444.
Robnik-Šikonja, M. and Kononenko, I. (2003).
Theoretical and empirical analysis of ReliefF and RReliefF.
Machine Learning, 53(1-2):23–69.
Statistics  ML for high throughput data integration
Nov 29-30 2021 / PhenoDyn / Nathalie Vialaneix  Rémi Servien
p. 21
Schäfer, P. (2015).
The BOSS is concerned with time series classification in the presence of noise.
Data Mining and Knowledge Discovery, 29(6):1505–1530.
Shifaz, A., Pelletier, C., Petitjean, F., and Webb, G. I. (2020).
TS-CHIEF: a scalable and accurate forest algorithm for time series classification.
Data Mining and Knowledge Discovery, 34:742–775.
Speiser, J. L., Miller, M. E., Tooze, J., and Ip, E. (2019).
A comparison of random forest variable selection methods for classification prediction modeling.
Expert Systems with Applications, 134:93–101.
Szymczak, S., Holzinger, E., Dasgupta, A., Malley, J., Molloy, A., Mills, J., Brody, L.,
Stambolian, D., and Bailey-Wilson, J. (2016).
r2VIM: a new variable selection method for random forests in genome-wide association studies.
BioData Mining, 9:7.
Statistics  ML for high throughput data integration
Nov 29-30 2021 / PhenoDyn / Nathalie Vialaneix  Rémi Servien
p. 22
Dictionnary/symbolic representation based
BOSS [Schäfer, 2015] and [Baydogan and Runger, 2015]
Based on: Fourier transform then symbolic representation.
[Baydogan and Runger, 2015] is similar, except that representation loses interval
information (based on a tree at time step level)
Statistics  ML for high throughput data integration
Nov 29-30 2021 / PhenoDyn / Nathalie Vialaneix  Rémi Servien
p. 22
Dictionnary/symbolic representation based
BOSS [Schäfer, 2015] and [Baydogan and Runger, 2015]
What is useful for our question? Uncertain... can symbolic representation itself be used
to represent/select (windowed) intervals? (untested) Back
Statistics  ML for high throughput data integration
Nov 29-30 2021 / PhenoDyn / Nathalie Vialaneix  Rémi Servien
p. 22

More Related Content

What's hot

A short introduction to statistical learning
A short introduction to statistical learningA short introduction to statistical learning
A short introduction to statistical learning
tuxette
 
Investigating the 3D structure of the genome with Hi-C data analysis
Investigating the 3D structure of the genome with Hi-C data analysisInvestigating the 3D structure of the genome with Hi-C data analysis
Investigating the 3D structure of the genome with Hi-C data analysis
tuxette
 
Selective inference and single-cell differential analysis
Selective inference and single-cell differential analysisSelective inference and single-cell differential analysis
Selective inference and single-cell differential analysis
tuxette
 
Combining co-expression and co-location for gene network inference in porcine...
Combining co-expression and co-location for gene network inference in porcine...Combining co-expression and co-location for gene network inference in porcine...
Combining co-expression and co-location for gene network inference in porcine...
tuxette
 
Convolutional networks and graph networks through kernels
Convolutional networks and graph networks through kernelsConvolutional networks and graph networks through kernels
Convolutional networks and graph networks through kernels
tuxette
 
A short and naive introduction to using network in prediction models
A short and naive introduction to using network in prediction modelsA short and naive introduction to using network in prediction models
A short and naive introduction to using network in prediction models
tuxette
 
Differential analyses of structures in HiC data
Differential analyses of structures in HiC dataDifferential analyses of structures in HiC data
Differential analyses of structures in HiC data
tuxette
 
Prototype-based models in machine learning
Prototype-based models in machine learningPrototype-based models in machine learning
Prototype-based models in machine learning
University of Groningen
 
Graph Neural Network in practice
Graph Neural Network in practiceGraph Neural Network in practice
Graph Neural Network in practice
tuxette
 
Deep Learning Opening Workshop - Domain Adaptation Challenges in Genomics: a ...
Deep Learning Opening Workshop - Domain Adaptation Challenges in Genomics: a ...Deep Learning Opening Workshop - Domain Adaptation Challenges in Genomics: a ...
Deep Learning Opening Workshop - Domain Adaptation Challenges in Genomics: a ...
The Statistical and Applied Mathematical Sciences Institute
 
Dimensionality reduction by matrix factorization using concept lattice in dat...
Dimensionality reduction by matrix factorization using concept lattice in dat...Dimensionality reduction by matrix factorization using concept lattice in dat...
Dimensionality reduction by matrix factorization using concept lattice in dat...
eSAT Journals
 
Deep Learning Opening Workshop - Horseshoe Regularization for Machine Learnin...
Deep Learning Opening Workshop - Horseshoe Regularization for Machine Learnin...Deep Learning Opening Workshop - Horseshoe Regularization for Machine Learnin...
Deep Learning Opening Workshop - Horseshoe Regularization for Machine Learnin...
The Statistical and Applied Mathematical Sciences Institute
 
Kernel methods in machine learning
Kernel methods in machine learningKernel methods in machine learning
Kernel methods in machine learningbutest
 
Kernel Methods and Relational Learning in Computational Biology
Kernel Methods and Relational Learning in Computational BiologyKernel Methods and Relational Learning in Computational Biology
Kernel Methods and Relational Learning in Computational BiologyMichiel Stock
 
On the Mining of Numerical Data with Formal Concept Analysis
On the Mining of Numerical Data with Formal Concept AnalysisOn the Mining of Numerical Data with Formal Concept Analysis
On the Mining of Numerical Data with Formal Concept Analysis
INSA Lyon - L'Institut National des Sciences Appliquées de Lyon
 
A Novel Approach to Mathematical Concepts in Data Mining
A Novel Approach to Mathematical Concepts in Data MiningA Novel Approach to Mathematical Concepts in Data Mining
A Novel Approach to Mathematical Concepts in Data Mining
ijdmtaiir
 
Xenia miscouridou wi mlds 4
Xenia miscouridou wi mlds 4Xenia miscouridou wi mlds 4
QMC: Transition Workshop - Selected Highlights from the Probabilistic Numeric...
QMC: Transition Workshop - Selected Highlights from the Probabilistic Numeric...QMC: Transition Workshop - Selected Highlights from the Probabilistic Numeric...
QMC: Transition Workshop - Selected Highlights from the Probabilistic Numeric...
The Statistical and Applied Mathematical Sciences Institute
 
Bayes Nets Meetup Sept 29th 2016 - Bayesian Network Modelling by Marco Scutari
Bayes Nets Meetup Sept 29th 2016 - Bayesian Network Modelling by Marco ScutariBayes Nets Meetup Sept 29th 2016 - Bayesian Network Modelling by Marco Scutari
Bayes Nets Meetup Sept 29th 2016 - Bayesian Network Modelling by Marco Scutari
Bayes Nets meetup London
 
PSF_Introduction_to_R_Package_for_Pattern_Sequence (1)
PSF_Introduction_to_R_Package_for_Pattern_Sequence (1)PSF_Introduction_to_R_Package_for_Pattern_Sequence (1)
PSF_Introduction_to_R_Package_for_Pattern_Sequence (1)neeraj7svp
 

What's hot (20)

A short introduction to statistical learning
A short introduction to statistical learningA short introduction to statistical learning
A short introduction to statistical learning
 
Investigating the 3D structure of the genome with Hi-C data analysis
Investigating the 3D structure of the genome with Hi-C data analysisInvestigating the 3D structure of the genome with Hi-C data analysis
Investigating the 3D structure of the genome with Hi-C data analysis
 
Selective inference and single-cell differential analysis
Selective inference and single-cell differential analysisSelective inference and single-cell differential analysis
Selective inference and single-cell differential analysis
 
Combining co-expression and co-location for gene network inference in porcine...
Combining co-expression and co-location for gene network inference in porcine...Combining co-expression and co-location for gene network inference in porcine...
Combining co-expression and co-location for gene network inference in porcine...
 
Convolutional networks and graph networks through kernels
Convolutional networks and graph networks through kernelsConvolutional networks and graph networks through kernels
Convolutional networks and graph networks through kernels
 
A short and naive introduction to using network in prediction models
A short and naive introduction to using network in prediction modelsA short and naive introduction to using network in prediction models
A short and naive introduction to using network in prediction models
 
Differential analyses of structures in HiC data
Differential analyses of structures in HiC dataDifferential analyses of structures in HiC data
Differential analyses of structures in HiC data
 
Prototype-based models in machine learning
Prototype-based models in machine learningPrototype-based models in machine learning
Prototype-based models in machine learning
 
Graph Neural Network in practice
Graph Neural Network in practiceGraph Neural Network in practice
Graph Neural Network in practice
 
Deep Learning Opening Workshop - Domain Adaptation Challenges in Genomics: a ...
Deep Learning Opening Workshop - Domain Adaptation Challenges in Genomics: a ...Deep Learning Opening Workshop - Domain Adaptation Challenges in Genomics: a ...
Deep Learning Opening Workshop - Domain Adaptation Challenges in Genomics: a ...
 
Dimensionality reduction by matrix factorization using concept lattice in dat...
Dimensionality reduction by matrix factorization using concept lattice in dat...Dimensionality reduction by matrix factorization using concept lattice in dat...
Dimensionality reduction by matrix factorization using concept lattice in dat...
 
Deep Learning Opening Workshop - Horseshoe Regularization for Machine Learnin...
Deep Learning Opening Workshop - Horseshoe Regularization for Machine Learnin...Deep Learning Opening Workshop - Horseshoe Regularization for Machine Learnin...
Deep Learning Opening Workshop - Horseshoe Regularization for Machine Learnin...
 
Kernel methods in machine learning
Kernel methods in machine learningKernel methods in machine learning
Kernel methods in machine learning
 
Kernel Methods and Relational Learning in Computational Biology
Kernel Methods and Relational Learning in Computational BiologyKernel Methods and Relational Learning in Computational Biology
Kernel Methods and Relational Learning in Computational Biology
 
On the Mining of Numerical Data with Formal Concept Analysis
On the Mining of Numerical Data with Formal Concept AnalysisOn the Mining of Numerical Data with Formal Concept Analysis
On the Mining of Numerical Data with Formal Concept Analysis
 
A Novel Approach to Mathematical Concepts in Data Mining
A Novel Approach to Mathematical Concepts in Data MiningA Novel Approach to Mathematical Concepts in Data Mining
A Novel Approach to Mathematical Concepts in Data Mining
 
Xenia miscouridou wi mlds 4
Xenia miscouridou wi mlds 4Xenia miscouridou wi mlds 4
Xenia miscouridou wi mlds 4
 
QMC: Transition Workshop - Selected Highlights from the Probabilistic Numeric...
QMC: Transition Workshop - Selected Highlights from the Probabilistic Numeric...QMC: Transition Workshop - Selected Highlights from the Probabilistic Numeric...
QMC: Transition Workshop - Selected Highlights from the Probabilistic Numeric...
 
Bayes Nets Meetup Sept 29th 2016 - Bayesian Network Modelling by Marco Scutari
Bayes Nets Meetup Sept 29th 2016 - Bayesian Network Modelling by Marco ScutariBayes Nets Meetup Sept 29th 2016 - Bayesian Network Modelling by Marco Scutari
Bayes Nets Meetup Sept 29th 2016 - Bayesian Network Modelling by Marco Scutari
 
PSF_Introduction_to_R_Package_for_Pattern_Sequence (1)
PSF_Introduction_to_R_Package_for_Pattern_Sequence (1)PSF_Introduction_to_R_Package_for_Pattern_Sequence (1)
PSF_Introduction_to_R_Package_for_Pattern_Sequence (1)
 

Similar to Explanable models for time series with random forest

Multi-omics data integration methods: kernel and other machine learning appro...
Multi-omics data integration methods: kernel and other machine learning appro...Multi-omics data integration methods: kernel and other machine learning appro...
Multi-omics data integration methods: kernel and other machine learning appro...
tuxette
 
Intégration de données omiques multi-échelles : méthodes à noyau et autres ap...
Intégration de données omiques multi-échelles : méthodes à noyau et autres ap...Intégration de données omiques multi-échelles : méthodes à noyau et autres ap...
Intégration de données omiques multi-échelles : méthodes à noyau et autres ap...
tuxette
 
Quelques résultats préliminaires de l'évaluation de méthodes d'inférence de r...
Quelques résultats préliminaires de l'évaluation de méthodes d'inférence de r...Quelques résultats préliminaires de l'évaluation de méthodes d'inférence de r...
Quelques résultats préliminaires de l'évaluation de méthodes d'inférence de r...
tuxette
 
Tree net and_randomforests_2009
Tree net and_randomforests_2009Tree net and_randomforests_2009
Tree net and_randomforests_2009Matthew Magistrado
 
SPSS statistics - get help using SPSS
SPSS statistics - get help using SPSSSPSS statistics - get help using SPSS
SPSS statistics - get help using SPSS
csula its training
 
Accounting for variance in machine learning benchmarks
Accounting for variance in machine learning benchmarksAccounting for variance in machine learning benchmarks
Accounting for variance in machine learning benchmarks
Devansh16
 
Lesson 6 measures of central tendency
Lesson 6 measures of central tendencyLesson 6 measures of central tendency
Lesson 6 measures of central tendencynurun2010
 
MSL 5080, Methods of Analysis for Business Operations 1 .docx
MSL 5080, Methods of Analysis for Business Operations 1 .docxMSL 5080, Methods of Analysis for Business Operations 1 .docx
MSL 5080, Methods of Analysis for Business Operations 1 .docx
madlynplamondon
 
Using HDDT to avoid instances propagation in unbalanced and evolving data str...
Using HDDT to avoid instances propagation in unbalanced and evolving data str...Using HDDT to avoid instances propagation in unbalanced and evolving data str...
Using HDDT to avoid instances propagation in unbalanced and evolving data str...
Andrea Dal Pozzolo
 
An approximate possibilistic
An approximate possibilisticAn approximate possibilistic
An approximate possibilistic
csandit
 
MAC411(A) Analysis in Communication Researc.ppt
MAC411(A) Analysis in Communication Researc.pptMAC411(A) Analysis in Communication Researc.ppt
MAC411(A) Analysis in Communication Researc.ppt
PreciousOsoOla
 
slides of ABC talk at i-like workshop, Warwick, May 16
slides of ABC talk at i-like workshop, Warwick, May 16slides of ABC talk at i-like workshop, Warwick, May 16
slides of ABC talk at i-like workshop, Warwick, May 16
Christian Robert
 
IRJET- Big Data and Bayes Theorem used Analyze the Student’s Performance in E...
IRJET- Big Data and Bayes Theorem used Analyze the Student’s Performance in E...IRJET- Big Data and Bayes Theorem used Analyze the Student’s Performance in E...
IRJET- Big Data and Bayes Theorem used Analyze the Student’s Performance in E...
IRJET Journal
 
Bayesian/Fiducial/Frequentist Uncertainty Quantification by Artificial Samples
Bayesian/Fiducial/Frequentist Uncertainty Quantification by Artificial SamplesBayesian/Fiducial/Frequentist Uncertainty Quantification by Artificial Samples
Bayesian/Fiducial/Frequentist Uncertainty Quantification by Artificial Samples
Liverpool Institute for Risk and Uncertainty
 
[A]BCel : a presentation at ABC in Roma
[A]BCel : a presentation at ABC in Roma[A]BCel : a presentation at ABC in Roma
[A]BCel : a presentation at ABC in Roma
Christian Robert
 
Data Analysis
Data Analysis Data Analysis
Data Analysis
DawitDibekulu
 
QT1 - 03 - Measures of Central Tendency
QT1 - 03 - Measures of Central TendencyQT1 - 03 - Measures of Central Tendency
QT1 - 03 - Measures of Central Tendency
Prithwis Mukerjee
 
QT1 - 03 - Measures of Central Tendency
QT1 - 03 - Measures of Central TendencyQT1 - 03 - Measures of Central Tendency
QT1 - 03 - Measures of Central Tendency
Prithwis Mukerjee
 
Regression, Bayesian Learning and Support vector machine
Regression, Bayesian Learning and Support vector machineRegression, Bayesian Learning and Support vector machine
Regression, Bayesian Learning and Support vector machine
Dr. Radhey Shyam
 
7 qc tools
7 qc tools7 qc tools
7 qc tools
kmsonam
 

Similar to Explanable models for time series with random forest (20)

Multi-omics data integration methods: kernel and other machine learning appro...
Multi-omics data integration methods: kernel and other machine learning appro...Multi-omics data integration methods: kernel and other machine learning appro...
Multi-omics data integration methods: kernel and other machine learning appro...
 
Intégration de données omiques multi-échelles : méthodes à noyau et autres ap...
Intégration de données omiques multi-échelles : méthodes à noyau et autres ap...Intégration de données omiques multi-échelles : méthodes à noyau et autres ap...
Intégration de données omiques multi-échelles : méthodes à noyau et autres ap...
 
Quelques résultats préliminaires de l'évaluation de méthodes d'inférence de r...
Quelques résultats préliminaires de l'évaluation de méthodes d'inférence de r...Quelques résultats préliminaires de l'évaluation de méthodes d'inférence de r...
Quelques résultats préliminaires de l'évaluation de méthodes d'inférence de r...
 
Tree net and_randomforests_2009
Tree net and_randomforests_2009Tree net and_randomforests_2009
Tree net and_randomforests_2009
 
SPSS statistics - get help using SPSS
SPSS statistics - get help using SPSSSPSS statistics - get help using SPSS
SPSS statistics - get help using SPSS
 
Accounting for variance in machine learning benchmarks
Accounting for variance in machine learning benchmarksAccounting for variance in machine learning benchmarks
Accounting for variance in machine learning benchmarks
 
Lesson 6 measures of central tendency
Lesson 6 measures of central tendencyLesson 6 measures of central tendency
Lesson 6 measures of central tendency
 
MSL 5080, Methods of Analysis for Business Operations 1 .docx
MSL 5080, Methods of Analysis for Business Operations 1 .docxMSL 5080, Methods of Analysis for Business Operations 1 .docx
MSL 5080, Methods of Analysis for Business Operations 1 .docx
 
Using HDDT to avoid instances propagation in unbalanced and evolving data str...
Using HDDT to avoid instances propagation in unbalanced and evolving data str...Using HDDT to avoid instances propagation in unbalanced and evolving data str...
Using HDDT to avoid instances propagation in unbalanced and evolving data str...
 
An approximate possibilistic
An approximate possibilisticAn approximate possibilistic
An approximate possibilistic
 
MAC411(A) Analysis in Communication Researc.ppt
MAC411(A) Analysis in Communication Researc.pptMAC411(A) Analysis in Communication Researc.ppt
MAC411(A) Analysis in Communication Researc.ppt
 
slides of ABC talk at i-like workshop, Warwick, May 16
slides of ABC talk at i-like workshop, Warwick, May 16slides of ABC talk at i-like workshop, Warwick, May 16
slides of ABC talk at i-like workshop, Warwick, May 16
 
IRJET- Big Data and Bayes Theorem used Analyze the Student’s Performance in E...
IRJET- Big Data and Bayes Theorem used Analyze the Student’s Performance in E...IRJET- Big Data and Bayes Theorem used Analyze the Student’s Performance in E...
IRJET- Big Data and Bayes Theorem used Analyze the Student’s Performance in E...
 
Bayesian/Fiducial/Frequentist Uncertainty Quantification by Artificial Samples
Bayesian/Fiducial/Frequentist Uncertainty Quantification by Artificial SamplesBayesian/Fiducial/Frequentist Uncertainty Quantification by Artificial Samples
Bayesian/Fiducial/Frequentist Uncertainty Quantification by Artificial Samples
 
[A]BCel : a presentation at ABC in Roma
[A]BCel : a presentation at ABC in Roma[A]BCel : a presentation at ABC in Roma
[A]BCel : a presentation at ABC in Roma
 
Data Analysis
Data Analysis Data Analysis
Data Analysis
 
QT1 - 03 - Measures of Central Tendency
QT1 - 03 - Measures of Central TendencyQT1 - 03 - Measures of Central Tendency
QT1 - 03 - Measures of Central Tendency
 
QT1 - 03 - Measures of Central Tendency
QT1 - 03 - Measures of Central TendencyQT1 - 03 - Measures of Central Tendency
QT1 - 03 - Measures of Central Tendency
 
Regression, Bayesian Learning and Support vector machine
Regression, Bayesian Learning and Support vector machineRegression, Bayesian Learning and Support vector machine
Regression, Bayesian Learning and Support vector machine
 
7 qc tools
7 qc tools7 qc tools
7 qc tools
 

More from tuxette

Racines en haut et feuilles en bas : les arbres en maths
Racines en haut et feuilles en bas : les arbres en mathsRacines en haut et feuilles en bas : les arbres en maths
Racines en haut et feuilles en bas : les arbres en maths
tuxette
 
Méthodes à noyaux pour l’intégration de données hétérogènes
Méthodes à noyaux pour l’intégration de données hétérogènesMéthodes à noyaux pour l’intégration de données hétérogènes
Méthodes à noyaux pour l’intégration de données hétérogènes
tuxette
 
Méthodologies d'intégration de données omiques
Méthodologies d'intégration de données omiquesMéthodologies d'intégration de données omiques
Méthodologies d'intégration de données omiques
tuxette
 
Projets autour de l'Hi-C
Projets autour de l'Hi-CProjets autour de l'Hi-C
Projets autour de l'Hi-C
tuxette
 
Can deep learning learn chromatin structure from sequence?
Can deep learning learn chromatin structure from sequence?Can deep learning learn chromatin structure from sequence?
Can deep learning learn chromatin structure from sequence?
tuxette
 
ASTERICS : une application pour intégrer des données omiques
ASTERICS : une application pour intégrer des données omiquesASTERICS : une application pour intégrer des données omiques
ASTERICS : une application pour intégrer des données omiques
tuxette
 
Autour des projets Idefics et MetaboWean
Autour des projets Idefics et MetaboWeanAutour des projets Idefics et MetaboWean
Autour des projets Idefics et MetaboWean
tuxette
 
Rserve, renv, flask, Vue.js dans un docker pour intégrer des données omiques ...
Rserve, renv, flask, Vue.js dans un docker pour intégrer des données omiques ...Rserve, renv, flask, Vue.js dans un docker pour intégrer des données omiques ...
Rserve, renv, flask, Vue.js dans un docker pour intégrer des données omiques ...
tuxette
 
Apprentissage pour la biologie moléculaire et l’analyse de données omiques
Apprentissage pour la biologie moléculaire et l’analyse de données omiquesApprentissage pour la biologie moléculaire et l’analyse de données omiques
Apprentissage pour la biologie moléculaire et l’analyse de données omiques
tuxette
 
Journal club: Validation of cluster analysis results on validation data
Journal club: Validation of cluster analysis results on validation dataJournal club: Validation of cluster analysis results on validation data
Journal club: Validation of cluster analysis results on validation data
tuxette
 
Overfitting or overparametrization?
Overfitting or overparametrization?Overfitting or overparametrization?
Overfitting or overparametrization?
tuxette
 
SOMbrero : un package R pour les cartes auto-organisatrices
SOMbrero : un package R pour les cartes auto-organisatricesSOMbrero : un package R pour les cartes auto-organisatrices
SOMbrero : un package R pour les cartes auto-organisatrices
tuxette
 
Présentation du projet ASTERICS
Présentation du projet ASTERICSPrésentation du projet ASTERICS
Présentation du projet ASTERICS
tuxette
 
Présentation du projet ASTERICS
Présentation du projet ASTERICSPrésentation du projet ASTERICS
Présentation du projet ASTERICS
tuxette
 
A review on structure learning in GNN
A review on structure learning in GNNA review on structure learning in GNN
A review on structure learning in GNN
tuxette
 
La famille *down
La famille *downLa famille *down
La famille *down
tuxette
 
From RNN to neural networks for cyclic undirected graphs
From RNN to neural networks for cyclic undirected graphsFrom RNN to neural networks for cyclic undirected graphs
From RNN to neural networks for cyclic undirected graphs
tuxette
 
An introduction to neural network
An introduction to neural networkAn introduction to neural network
An introduction to neural network
tuxette
 

More from tuxette (18)

Racines en haut et feuilles en bas : les arbres en maths
Racines en haut et feuilles en bas : les arbres en mathsRacines en haut et feuilles en bas : les arbres en maths
Racines en haut et feuilles en bas : les arbres en maths
 
Méthodes à noyaux pour l’intégration de données hétérogènes
Méthodes à noyaux pour l’intégration de données hétérogènesMéthodes à noyaux pour l’intégration de données hétérogènes
Méthodes à noyaux pour l’intégration de données hétérogènes
 
Méthodologies d'intégration de données omiques
Méthodologies d'intégration de données omiquesMéthodologies d'intégration de données omiques
Méthodologies d'intégration de données omiques
 
Projets autour de l'Hi-C
Projets autour de l'Hi-CProjets autour de l'Hi-C
Projets autour de l'Hi-C
 
Can deep learning learn chromatin structure from sequence?
Can deep learning learn chromatin structure from sequence?Can deep learning learn chromatin structure from sequence?
Can deep learning learn chromatin structure from sequence?
 
ASTERICS : une application pour intégrer des données omiques
ASTERICS : une application pour intégrer des données omiquesASTERICS : une application pour intégrer des données omiques
ASTERICS : une application pour intégrer des données omiques
 
Autour des projets Idefics et MetaboWean
Autour des projets Idefics et MetaboWeanAutour des projets Idefics et MetaboWean
Autour des projets Idefics et MetaboWean
 
Rserve, renv, flask, Vue.js dans un docker pour intégrer des données omiques ...
Rserve, renv, flask, Vue.js dans un docker pour intégrer des données omiques ...Rserve, renv, flask, Vue.js dans un docker pour intégrer des données omiques ...
Rserve, renv, flask, Vue.js dans un docker pour intégrer des données omiques ...
 
Apprentissage pour la biologie moléculaire et l’analyse de données omiques
Apprentissage pour la biologie moléculaire et l’analyse de données omiquesApprentissage pour la biologie moléculaire et l’analyse de données omiques
Apprentissage pour la biologie moléculaire et l’analyse de données omiques
 
Journal club: Validation of cluster analysis results on validation data
Journal club: Validation of cluster analysis results on validation dataJournal club: Validation of cluster analysis results on validation data
Journal club: Validation of cluster analysis results on validation data
 
Overfitting or overparametrization?
Overfitting or overparametrization?Overfitting or overparametrization?
Overfitting or overparametrization?
 
SOMbrero : un package R pour les cartes auto-organisatrices
SOMbrero : un package R pour les cartes auto-organisatricesSOMbrero : un package R pour les cartes auto-organisatrices
SOMbrero : un package R pour les cartes auto-organisatrices
 
Présentation du projet ASTERICS
Présentation du projet ASTERICSPrésentation du projet ASTERICS
Présentation du projet ASTERICS
 
Présentation du projet ASTERICS
Présentation du projet ASTERICSPrésentation du projet ASTERICS
Présentation du projet ASTERICS
 
A review on structure learning in GNN
A review on structure learning in GNNA review on structure learning in GNN
A review on structure learning in GNN
 
La famille *down
La famille *downLa famille *down
La famille *down
 
From RNN to neural networks for cyclic undirected graphs
From RNN to neural networks for cyclic undirected graphsFrom RNN to neural networks for cyclic undirected graphs
From RNN to neural networks for cyclic undirected graphs
 
An introduction to neural network
An introduction to neural networkAn introduction to neural network
An introduction to neural network
 

Recently uploaded

insect taxonomy importance systematics and classification
insect taxonomy importance systematics and classificationinsect taxonomy importance systematics and classification
insect taxonomy importance systematics and classification
anitaento25
 
Predicting property prices with machine learning algorithms.pdf
Predicting property prices with machine learning algorithms.pdfPredicting property prices with machine learning algorithms.pdf
Predicting property prices with machine learning algorithms.pdf
binhminhvu04
 
(May 29th, 2024) Advancements in Intravital Microscopy- Insights for Preclini...
(May 29th, 2024) Advancements in Intravital Microscopy- Insights for Preclini...(May 29th, 2024) Advancements in Intravital Microscopy- Insights for Preclini...
(May 29th, 2024) Advancements in Intravital Microscopy- Insights for Preclini...
Scintica Instrumentation
 
Citrus Greening Disease and its Management
Citrus Greening Disease and its ManagementCitrus Greening Disease and its Management
Citrus Greening Disease and its Management
subedisuryaofficial
 
Seminar of U.V. Spectroscopy by SAMIR PANDA
 Seminar of U.V. Spectroscopy by SAMIR PANDA Seminar of U.V. Spectroscopy by SAMIR PANDA
Seminar of U.V. Spectroscopy by SAMIR PANDA
SAMIR PANDA
 
SCHIZOPHRENIA Disorder/ Brain Disorder.pdf
SCHIZOPHRENIA Disorder/ Brain Disorder.pdfSCHIZOPHRENIA Disorder/ Brain Disorder.pdf
SCHIZOPHRENIA Disorder/ Brain Disorder.pdf
SELF-EXPLANATORY
 
Viksit bharat till 2047 India@2047.pptx
Viksit bharat till 2047  India@2047.pptxViksit bharat till 2047  India@2047.pptx
Viksit bharat till 2047 India@2047.pptx
rakeshsharma20142015
 
Large scale production of streptomycin.pptx
Large scale production of streptomycin.pptxLarge scale production of streptomycin.pptx
Large scale production of streptomycin.pptx
Cherry
 
Mammalian Pineal Body Structure and Also Functions
Mammalian Pineal Body Structure and Also FunctionsMammalian Pineal Body Structure and Also Functions
Mammalian Pineal Body Structure and Also Functions
YOGESH DOGRA
 
PRESENTATION ABOUT PRINCIPLE OF COSMATIC EVALUATION
PRESENTATION ABOUT PRINCIPLE OF COSMATIC EVALUATIONPRESENTATION ABOUT PRINCIPLE OF COSMATIC EVALUATION
PRESENTATION ABOUT PRINCIPLE OF COSMATIC EVALUATION
ChetanK57
 
THE IMPORTANCE OF MARTIAN ATMOSPHERE SAMPLE RETURN.
THE IMPORTANCE OF MARTIAN ATMOSPHERE SAMPLE RETURN.THE IMPORTANCE OF MARTIAN ATMOSPHERE SAMPLE RETURN.
THE IMPORTANCE OF MARTIAN ATMOSPHERE SAMPLE RETURN.
Sérgio Sacani
 
platelets_clotting_biogenesis.clot retractionpptx
platelets_clotting_biogenesis.clot retractionpptxplatelets_clotting_biogenesis.clot retractionpptx
platelets_clotting_biogenesis.clot retractionpptx
muralinath2
 
GBSN - Biochemistry (Unit 5) Chemistry of Lipids
GBSN - Biochemistry (Unit 5) Chemistry of LipidsGBSN - Biochemistry (Unit 5) Chemistry of Lipids
GBSN - Biochemistry (Unit 5) Chemistry of Lipids
Areesha Ahmad
 
In silico drugs analogue design: novobiocin analogues.pptx
In silico drugs analogue design: novobiocin analogues.pptxIn silico drugs analogue design: novobiocin analogues.pptx
In silico drugs analogue design: novobiocin analogues.pptx
AlaminAfendy1
 
Hemostasis_importance& clinical significance.pptx
Hemostasis_importance& clinical significance.pptxHemostasis_importance& clinical significance.pptx
Hemostasis_importance& clinical significance.pptx
muralinath2
 
justice-and-fairness-ethics with example
justice-and-fairness-ethics with examplejustice-and-fairness-ethics with example
justice-and-fairness-ethics with example
azzyixes
 
Earliest Galaxies in the JADES Origins Field: Luminosity Function and Cosmic ...
Earliest Galaxies in the JADES Origins Field: Luminosity Function and Cosmic ...Earliest Galaxies in the JADES Origins Field: Luminosity Function and Cosmic ...
Earliest Galaxies in the JADES Origins Field: Luminosity Function and Cosmic ...
Sérgio Sacani
 
GBSN- Microbiology (Lab 3) Gram Staining
GBSN- Microbiology (Lab 3) Gram StainingGBSN- Microbiology (Lab 3) Gram Staining
GBSN- Microbiology (Lab 3) Gram Staining
Areesha Ahmad
 
erythropoiesis-I_mechanism& clinical significance.pptx
erythropoiesis-I_mechanism& clinical significance.pptxerythropoiesis-I_mechanism& clinical significance.pptx
erythropoiesis-I_mechanism& clinical significance.pptx
muralinath2
 
Astronomy Update- Curiosity’s exploration of Mars _ Local Briefs _ leadertele...
Astronomy Update- Curiosity’s exploration of Mars _ Local Briefs _ leadertele...Astronomy Update- Curiosity’s exploration of Mars _ Local Briefs _ leadertele...
Astronomy Update- Curiosity’s exploration of Mars _ Local Briefs _ leadertele...
NathanBaughman3
 

Recently uploaded (20)

insect taxonomy importance systematics and classification
insect taxonomy importance systematics and classificationinsect taxonomy importance systematics and classification
insect taxonomy importance systematics and classification
 
Predicting property prices with machine learning algorithms.pdf
Predicting property prices with machine learning algorithms.pdfPredicting property prices with machine learning algorithms.pdf
Predicting property prices with machine learning algorithms.pdf
 
(May 29th, 2024) Advancements in Intravital Microscopy- Insights for Preclini...
(May 29th, 2024) Advancements in Intravital Microscopy- Insights for Preclini...(May 29th, 2024) Advancements in Intravital Microscopy- Insights for Preclini...
(May 29th, 2024) Advancements in Intravital Microscopy- Insights for Preclini...
 
Citrus Greening Disease and its Management
Citrus Greening Disease and its ManagementCitrus Greening Disease and its Management
Citrus Greening Disease and its Management
 
Seminar of U.V. Spectroscopy by SAMIR PANDA
 Seminar of U.V. Spectroscopy by SAMIR PANDA Seminar of U.V. Spectroscopy by SAMIR PANDA
Seminar of U.V. Spectroscopy by SAMIR PANDA
 
SCHIZOPHRENIA Disorder/ Brain Disorder.pdf
SCHIZOPHRENIA Disorder/ Brain Disorder.pdfSCHIZOPHRENIA Disorder/ Brain Disorder.pdf
SCHIZOPHRENIA Disorder/ Brain Disorder.pdf
 
Viksit bharat till 2047 India@2047.pptx
Viksit bharat till 2047  India@2047.pptxViksit bharat till 2047  India@2047.pptx
Viksit bharat till 2047 India@2047.pptx
 
Large scale production of streptomycin.pptx
Large scale production of streptomycin.pptxLarge scale production of streptomycin.pptx
Large scale production of streptomycin.pptx
 
Mammalian Pineal Body Structure and Also Functions
Mammalian Pineal Body Structure and Also FunctionsMammalian Pineal Body Structure and Also Functions
Mammalian Pineal Body Structure and Also Functions
 
PRESENTATION ABOUT PRINCIPLE OF COSMATIC EVALUATION
PRESENTATION ABOUT PRINCIPLE OF COSMATIC EVALUATIONPRESENTATION ABOUT PRINCIPLE OF COSMATIC EVALUATION
PRESENTATION ABOUT PRINCIPLE OF COSMATIC EVALUATION
 
THE IMPORTANCE OF MARTIAN ATMOSPHERE SAMPLE RETURN.
THE IMPORTANCE OF MARTIAN ATMOSPHERE SAMPLE RETURN.THE IMPORTANCE OF MARTIAN ATMOSPHERE SAMPLE RETURN.
THE IMPORTANCE OF MARTIAN ATMOSPHERE SAMPLE RETURN.
 
platelets_clotting_biogenesis.clot retractionpptx
platelets_clotting_biogenesis.clot retractionpptxplatelets_clotting_biogenesis.clot retractionpptx
platelets_clotting_biogenesis.clot retractionpptx
 
GBSN - Biochemistry (Unit 5) Chemistry of Lipids
GBSN - Biochemistry (Unit 5) Chemistry of LipidsGBSN - Biochemistry (Unit 5) Chemistry of Lipids
GBSN - Biochemistry (Unit 5) Chemistry of Lipids
 
In silico drugs analogue design: novobiocin analogues.pptx
In silico drugs analogue design: novobiocin analogues.pptxIn silico drugs analogue design: novobiocin analogues.pptx
In silico drugs analogue design: novobiocin analogues.pptx
 
Hemostasis_importance& clinical significance.pptx
Hemostasis_importance& clinical significance.pptxHemostasis_importance& clinical significance.pptx
Hemostasis_importance& clinical significance.pptx
 
justice-and-fairness-ethics with example
justice-and-fairness-ethics with examplejustice-and-fairness-ethics with example
justice-and-fairness-ethics with example
 
Earliest Galaxies in the JADES Origins Field: Luminosity Function and Cosmic ...
Earliest Galaxies in the JADES Origins Field: Luminosity Function and Cosmic ...Earliest Galaxies in the JADES Origins Field: Luminosity Function and Cosmic ...
Earliest Galaxies in the JADES Origins Field: Luminosity Function and Cosmic ...
 
GBSN- Microbiology (Lab 3) Gram Staining
GBSN- Microbiology (Lab 3) Gram StainingGBSN- Microbiology (Lab 3) Gram Staining
GBSN- Microbiology (Lab 3) Gram Staining
 
erythropoiesis-I_mechanism& clinical significance.pptx
erythropoiesis-I_mechanism& clinical significance.pptxerythropoiesis-I_mechanism& clinical significance.pptx
erythropoiesis-I_mechanism& clinical significance.pptx
 
Astronomy Update- Curiosity’s exploration of Mars _ Local Briefs _ leadertele...
Astronomy Update- Curiosity’s exploration of Mars _ Local Briefs _ leadertele...Astronomy Update- Curiosity’s exploration of Mars _ Local Briefs _ leadertele...
Astronomy Update- Curiosity’s exploration of Mars _ Local Briefs _ leadertele...
 

Explanable models for time series with random forest

  • 1. Explanable models for time series with random forest Nathalie Vialaneix(1) & Rémi Servien (1) nathalie.vialaneix@inrae.fr http://www.nathalievialaneix.eu First PhenoDyn meeting November 29-30, 2021
  • 2. Scientific question ? − → Purpose: prediction of a target quantity (e.g., yield) from functional data (e.g., weather time series) Statistics & ML for high throughput data integration Nov 29-30 2021 / PhenoDyn / Nathalie Vialaneix & Rémi Servien p. 2
  • 3. Scientific question & difficulty at stake Purpose: Improve interpretability by selecting the most predictive intervals. Statistics & ML for high throughput data integration Nov 29-30 2021 / PhenoDyn / Nathalie Vialaneix & Rémi Servien p. 3
  • 4. Scientific question & difficulty at stake Purpose: Improve interpretability by selecting the most predictive intervals. Challenge: Selection of intervals is not too hard (e.g., group Lasso) but creating the relevant intervals (starting point, length) is hard. Existing solutions: [Picheny et al., 2019, Grollemund et al., 2019] Statistics & ML for high throughput data integration Nov 29-30 2021 / PhenoDyn / Nathalie Vialaneix & Rémi Servien p. 3
  • 5. Scientific question & framework of the presentation Here: random forest Why? I versatile method for prediction I easy to use and relatively fast I good prediction ability in general I natural framework for interpretability (importance through OOB samples) Statistics & ML for high throughput data integration Nov 29-30 2021 / PhenoDyn / Nathalie Vialaneix & Rémi Servien p. 4
  • 6. What is needed to achieve that goal? Three/four key ingredients 1. random forest for time series 2. (maybe optional) ... based on summary descriptors of intervals 3. building intervals 4. selecting intervals Statistics & ML for high throughput data integration Nov 29-30 2021 / PhenoDyn / Nathalie Vialaneix & Rémi Servien p. 5
  • 7. A short reminder on random forest [Breiman, 2001] Ln LΘ1 n LΘ` n L Θq n b h(., Θ1, Θ0 1) b h(., Θ`, Θ0 `) b h(., Θq, Θ0 q) b hRF−RI (.) Bootstrap Arbre RI Agrégation Courtesy of Robin Genuer and Jean-Michel Poggi. Statistics & ML for high throughput data integration Nov 29-30 2021 / PhenoDyn / Nathalie Vialaneix & Rémi Servien p. 6
  • 8. CART [Breiman et al., 1984] C8 C9 C4 C5 C2 C3 C1 X1 ≤ d3 X1 > d3 X2 ≤ d2 X2 > d2 X1 ≤ d1 X1 > d1 d3 d1 d2 X1 X2 C4 C3 C8 C9 Statistics & ML for high throughput data integration Nov 29-30 2021 / PhenoDyn / Nathalie Vialaneix & Rémi Servien p. 7
  • 9. Technical details I splits are made by randomly choosing mtry < d variables randomly and by finding the “best split” among this selection I aggregation: average (target is numeric) or maximum vote rule (target is a class) Statistics & ML for high throughput data integration Nov 29-30 2021 / PhenoDyn / Nathalie Vialaneix & Rémi Servien p. 8
  • 10. Technical details I splits are made by randomly choosing mtry < d variables randomly and by finding the “best split” among this selection I aggregation: average (target is numeric) or maximum vote rule (target is a class) I OOB error: average error (over trees) on samples not included in the bootstrap of the tree I variable importance: the larger the increase in error, the most important the variable is (based on random permutation) Statistics & ML for high throughput data integration Nov 29-30 2021 / PhenoDyn / Nathalie Vialaneix & Rémi Servien p. 8
  • 11. Extensions of random forest for time series I Similarity based techniques I Fréchet forest [Capitaine et al., 2020] I Proximity forest [Lucas et al., 2019] (restricting to classification) Image by courtesy of Charlotte Pelletier Statistics & ML for high throughput data integration Nov 29-30 2021 / PhenoDyn / Nathalie Vialaneix & Rémi Servien p. 9
  • 12. Extensions of random forest for time series I Similarity based techniques I Fréchet forest [Capitaine et al., 2020] I Proximity forest [Lucas et al., 2019] (restricting to classification) I Interval based techniques I Time Series Forest [Deng et al., 2013] and its extension [Middlehurst et al., 2020] I RISE [Lines et al., 2018] (a tree = a randomly selected interval) Statistics & ML for high throughput data integration Nov 29-30 2021 / PhenoDyn / Nathalie Vialaneix & Rémi Servien p. 9
  • 13. Extensions of random forest for time series I Similarity based techniques I Fréchet forest [Capitaine et al., 2020] I Proximity forest [Lucas et al., 2019] (restricting to classification) I Interval based techniques I Time Series Forest [Deng et al., 2013] and its extension [Middlehurst et al., 2020] I RISE [Lines et al., 2018] (a tree = a randomly selected interval) I Dictionnary or symbolic representation based techniques: I TS-CHIEF [Shifaz et al., 2020] (combines all types of splits including dictionnary based splits based on work of [Schäfer, 2015]) I (multivariate time series) symbolic representation of time series [Baydogan and Runger, 2015] More on that Statistics & ML for high throughput data integration Nov 29-30 2021 / PhenoDyn / Nathalie Vialaneix & Rémi Servien p. 9
  • 14. Time Series Forest Basic principles: 1. for a given tree: random sampling of intervals 2. for a given tree: compute summaries (mean, sd, slope for [Deng et al., 2013] and catch22 for [Middlehurst et al., 2020]) 3. define splits as usual based on these summaries Statistics & ML for high throughput data integration Nov 29-30 2021 / PhenoDyn / Nathalie Vialaneix & Rémi Servien p. 10
  • 15. Time Series Forest Basic principles: 1. for a given tree: random sampling of intervals 2. for a given tree: compute summaries (mean, sd, slope for [Deng et al., 2013] and catch22 for [Middlehurst et al., 2020]) 3. define splits as usual based on these summaries What is useful for our question? I combined with variable selection could help identify important intervals (still to be tested) I ideas to summarize information of an entire intervals (already partially tested) Statistics & ML for high throughput data integration Nov 29-30 2021 / PhenoDyn / Nathalie Vialaneix & Rémi Servien p. 10
  • 16. Extensions on summaries I supervised: accounting for Y (is that useful?): PLS, linear models (including ridge)... similar to [Poterie et al., 2019] (grouped variables summarized by LDA) and to [Rainforth and Wood, 2017] (CCA based splits) Statistics & ML for high throughput data integration Nov 29-30 2021 / PhenoDyn / Nathalie Vialaneix & Rémi Servien p. 11
  • 17. Extensions on summaries I supervised: accounting for Y (is that useful?): PLS, linear models (including ridge)... similar to [Poterie et al., 2019] (grouped variables summarized by LDA) and to [Rainforth and Wood, 2017] (CCA based splits) I unsupervised: first PC of PCA, as in ClustOfVar method [Chavent et al., 2012], as in [Chavent et al., 2021] (can also be useful to build groups) Statistics & ML for high throughput data integration Nov 29-30 2021 / PhenoDyn / Nathalie Vialaneix & Rémi Servien p. 11
  • 18. Extensions on summaries I supervised: accounting for Y (is that useful?): PLS, linear models (including ridge)... similar to [Poterie et al., 2019] (grouped variables summarized by LDA) and to [Rainforth and Wood, 2017] (CCA based splits) I unsupervised: first PC of PCA, as in ClustOfVar method [Chavent et al., 2012], as in [Chavent et al., 2021] (can also be useful to build groups) I Could a step further be taken by using oblique splits [Bertsimas and Dunn, 2017] (let the forest decides how to combine to find the best split)? See also: [Hornung and Boulesteix, 2021] Statistics & ML for high throughput data integration Nov 29-30 2021 / PhenoDyn / Nathalie Vialaneix & Rémi Servien p. 11
  • 19. Strategies for building intervals I Precomputing intervals independantly of Y (based on correlation between time points): constrained extension of ClustOfVar (PCA like criterion), adjclust (contrainstred clustering based on correlation between variables) ⇒ hierarchy of intervals Statistics & ML for high throughput data integration Nov 29-30 2021 / PhenoDyn / Nathalie Vialaneix & Rémi Servien p. 12
  • 20. Strategies for building intervals I Precomputing intervals independantly of Y (based on correlation between time points): constrained extension of ClustOfVar (PCA like criterion), adjclust (contrainstred clustering based on correlation between variables) ⇒ hierarchy of intervals I Precomputing intervals independantly of Y (based on greedy agglomeration): alterning between I regression based step (LM) between any two consecutive variables to select the best merge (minimum loss or maximum gain in accuracy) I summary (depends on regression type) ⇒ hierarchy of intervals Statistics & ML for high throughput data integration Nov 29-30 2021 / PhenoDyn / Nathalie Vialaneix & Rémi Servien p. 12
  • 21. Strategies for building intervals I Precomputing intervals independantly of Y (based on correlation between time points): constrained extension of ClustOfVar (PCA like criterion), adjclust (contrainstred clustering based on correlation between variables) ⇒ hierarchy of intervals I Precomputing intervals independantly of Y (based on greedy agglomeration): alterning between I regression based step (LM) between any two consecutive variables to select the best merge (minimum loss or maximum gain in accuracy) I summary (depends on regression type) ⇒ hierarchy of intervals I Using random forest to compute a hierarchy of interval based on loss in grouped importance in a greedy manner [Gregorutti et al., 2015] Statistics & ML for high throughput data integration Nov 29-30 2021 / PhenoDyn / Nathalie Vialaneix & Rémi Servien p. 12
  • 22. Strategies for selecting variables (here, intervals) in RF Huge litterature...... a few reviews: [Degenhardt et al., 2019, Speiser et al., 2019] Statistics & ML for high throughput data integration Nov 29-30 2021 / PhenoDyn / Nathalie Vialaneix & Rémi Servien p. 13
  • 23. Strategies for selecting variables (here, intervals) in RF Huge litterature...... a few reviews: [Degenhardt et al., 2019, Speiser et al., 2019] I based on importance: I just use the importance... I ranking with importance then selection of the best model with the first k variables (K = 1, . . . , K) VSURF [Genuer et al., 2010] I [Altmann et al., 2010] or [Szymczak et al., 2016] based on data-driven importance threshold (untested) Statistics & ML for high throughput data integration Nov 29-30 2021 / PhenoDyn / Nathalie Vialaneix & Rémi Servien p. 13
  • 24. Strategies for selecting variables (here, intervals) in RF Huge litterature...... a few reviews: [Degenhardt et al., 2019, Speiser et al., 2019] I based on importance: I just use the importance... I ranking with importance then selection of the best model with the first k variables (K = 1, . . . , K) VSURF [Genuer et al., 2010] I [Altmann et al., 2010] or [Szymczak et al., 2016] based on data-driven importance threshold (untested) I based on external variable selection methods: I Knockoffs [Barber and Candès, 2015], as in Boruta [Kursa and Rudnicki, 2010] I Relief [Robnik-Šikonja and Kononenko, 2003] Statistics & ML for high throughput data integration Nov 29-30 2021 / PhenoDyn / Nathalie Vialaneix & Rémi Servien p. 13
  • 25. Simulation setting I predictors: 1,000 EMS time series (length: 444) I important intervals I target: yi = log(1 + |hxi , βi|) + with β(t) = 4 × 1t∈[320,410] + 2 × 1t∈[500,550] − 1t∈[680,730] Statistics ML for high throughput data integration Nov 29-30 2021 / PhenoDyn / Nathalie Vialaneix Rémi Servien p. 14
  • 26. Evaluated scenarios Scenario 1: pre-computed groups, summary, RF and importance based evaluation Courtesy of Louisa Villa. Statistics ML for high throughput data integration Nov 29-30 2021 / PhenoDyn / Nathalie Vialaneix Rémi Servien p. 15
  • 27. Evaluated scenarios Scenario 2: pre-computed groups, summary, selection and RF Courtesy of Louisa Villa. Statistics ML for high throughput data integration Nov 29-30 2021 / PhenoDyn / Nathalie Vialaneix Rémi Servien p. 16
  • 28. Evaluated scenarios Scenario 3: groups computed in interaction with importance or variable selection (not explained), summary, selection (or not) and RF Courtesy of Louisa Villa. Statistics ML for high throughput data integration Nov 29-30 2021 / PhenoDyn / Nathalie Vialaneix Rémi Servien p. 17
  • 29. Evaluation criteria Resemblance of important/selected intervals with ground truth Accuracy Statistics ML for high throughput data integration Nov 29-30 2021 / PhenoDyn / Nathalie Vialaneix Rémi Servien p. 18
  • 30. A few take home messages (to be confirmed) I pre-computed groups based on correlation (especially adjclust) are better I PLS is best in terms of summary strategy I combining selection strategy with RF is computationnaly extensive and unefficient I overall, the recovery of groups is a bit disappointing Statistics ML for high throughput data integration Nov 29-30 2021 / PhenoDyn / Nathalie Vialaneix Rémi Servien p. 19
  • 31. adjclust + PLS + Boruta (scenario 2) ⇒ Model selection (or model aggregation?) seem critical... Statistics ML for high throughput data integration Nov 29-30 2021 / PhenoDyn / Nathalie Vialaneix Rémi Servien p. 20
  • 32. To be continued... Statistics ML for high throughput data integration Nov 29-30 2021 / PhenoDyn / Nathalie Vialaneix Rémi Servien p. 21
  • 33. References Altmann, A., Tolosi, L., Sander, O., and Lengauer, T. (2010). Permutation importance: a corrected feature importance measure. Bioinformatics, 26(10):1340–1347. Barber, R. F. and Candès, E. (2015). Controlling the false discovery rate via knockoffs. Annals of Statistics, 43(5):2055–2085. Baydogan, M. G. and Runger, G. (2015). Learning a symbolic representation for multivariate time series classification. Data Mining and Knowledge Discovery, 29:400–422. Bertsimas, D. and Dunn, J. (2017). Optimal classification trees. Machine Learning, 106(7):1039–1082. Breiman, L. (2001). Random forests. Statistics ML for high throughput data integration Nov 29-30 2021 / PhenoDyn / Nathalie Vialaneix Rémi Servien p. 21
  • 34. Machine Learning, 45(1):5–32. Breiman, L., Friedman, J., Olsen, R., and Stone, C. (1984). Classification and Regression Trees. Chapman and Hall, Boca Raton, Florida, USA. Capitaine, L., Bigot, J., Thiébaut, R., and Genuer, R. (2020). Fréchet random forests for metric space valued regression with non Euclidean predictors. Preprint arXiv:1906.01741v2. Chavent, M., Genuer, R., and Saracco, J. (2021). Combining clustering of variables and feature selection using random forests. Communications in Statistics - Simulation and Computation, 50(2):426–445. Chavent, M., Liquet, B., Kuentz-Simonet, V., and Saracco, J. (2012). ClustOfVar: an R package for the clustering of variables. Journal of Statistical Software, 50(13):1–16. Degenhardt, F., Seifert, S., and Szymczak, S. (2019). Evaluation of variable selection methods for random forests and omics data sets. Statistics ML for high throughput data integration Nov 29-30 2021 / PhenoDyn / Nathalie Vialaneix Rémi Servien p. 21
  • 35. Briefings in Bioinformatics, 20(2):492–503. Deng, H., Runger, G., Tuv, E., and Martyanov, V. (2013). A time series forest for classification and feature extraction. Information Science, 239:142–153. Genuer, R., Poggi, J.-M., and Tuleau-Malot, C. (2010). Variable selection using random forests. Pattern Recognition Letters, 31(14):2225–2236. Gregorutti, B., Michel, B., and Saint-Pierre, P. (2015). Grouped variable importance with random forests and application to multiple functional data analysis. Computational Statistics and Data Analysis, 90:15–35. Grollemund, P.-M., Abraham, C., Baragatti, M., and Pudlo, P. (2019). Bayesian functional linear regression with sparse step functions. Bayesian Analysis, 14(1):111–135. Hornung, R. and Boulesteix, A.-L. (2021). Statistics ML for high throughput data integration Nov 29-30 2021 / PhenoDyn / Nathalie Vialaneix Rémi Servien p. 21
  • 36. Interaction forests: identifying and exploiting interpretable quantitative and qualitative interaction effects. Technical Report Number 237, Department of Statistics, University of Munich, Germany. Kursa, M. and Rudnicki, W. (2010). Feature selection with the Boruta package. Journal of Statistical Software, 36(11):1–13. Lines, J., Taylor, S., and Bagnall, A. (2018). Time series classification with HIVE-COTE: the hierarchical vote collective of transformation-based ensembles. ACM Transactions on Knowledge Discovery from Data, 12(5):1–35. Lucas, B., Shifaz, A., Pelletier, C., O’Neill, L., Zaidi, N., Goethals, B., Petitjean, F., and Webb, G. I. (2019). Proximity forest: an effective and scalable distance based classifier for time series. Data Mining and Knowledge Discovery, 33:607–635. Middlehurst, M., Large, J., and Bagnall, A. (2020). Statistics ML for high throughput data integration Nov 29-30 2021 / PhenoDyn / Nathalie Vialaneix Rémi Servien p. 21
  • 37. The canonical interval forest (CIF) classifier for time series classification. In Wu, X., Jermaine, C., Hu, X., Kotevskia, O., Lu, S., Xu, W., Aluru, S., Zhai, C., Al-Masri, E., Chen, Z., and Saltz, J., editors, Proceedings of IEEE International Conference on Big Data, Atlanta, GA, USA. IEEE. Picheny, V., Servien, R., and Villa-Vialaneix, N. (2019). Interpretable sparse sliced inverse regression for functional data. Statistics and Computing, 29(2):255–267. Poterie, A., Dupuy, J.-F., Monbet, V., and Rouvière, L. (2019). Classification tree algorithm for grouped variables. Computational Statistics, 34:1613–1648. Rainforth, T. and Wood, F. (2017). Canonical correlation forests. arXiv: 1507.05444. Robnik-Šikonja, M. and Kononenko, I. (2003). Theoretical and empirical analysis of ReliefF and RReliefF. Machine Learning, 53(1-2):23–69. Statistics ML for high throughput data integration Nov 29-30 2021 / PhenoDyn / Nathalie Vialaneix Rémi Servien p. 21
  • 38. Schäfer, P. (2015). The BOSS is concerned with time series classification in the presence of noise. Data Mining and Knowledge Discovery, 29(6):1505–1530. Shifaz, A., Pelletier, C., Petitjean, F., and Webb, G. I. (2020). TS-CHIEF: a scalable and accurate forest algorithm for time series classification. Data Mining and Knowledge Discovery, 34:742–775. Speiser, J. L., Miller, M. E., Tooze, J., and Ip, E. (2019). A comparison of random forest variable selection methods for classification prediction modeling. Expert Systems with Applications, 134:93–101. Szymczak, S., Holzinger, E., Dasgupta, A., Malley, J., Molloy, A., Mills, J., Brody, L., Stambolian, D., and Bailey-Wilson, J. (2016). r2VIM: a new variable selection method for random forests in genome-wide association studies. BioData Mining, 9:7. Statistics ML for high throughput data integration Nov 29-30 2021 / PhenoDyn / Nathalie Vialaneix Rémi Servien p. 22
  • 39. Dictionnary/symbolic representation based BOSS [Schäfer, 2015] and [Baydogan and Runger, 2015] Based on: Fourier transform then symbolic representation. [Baydogan and Runger, 2015] is similar, except that representation loses interval information (based on a tree at time step level) Statistics ML for high throughput data integration Nov 29-30 2021 / PhenoDyn / Nathalie Vialaneix Rémi Servien p. 22
  • 40. Dictionnary/symbolic representation based BOSS [Schäfer, 2015] and [Baydogan and Runger, 2015] What is useful for our question? Uncertain... can symbolic representation itself be used to represent/select (windowed) intervals? (untested) Back Statistics ML for high throughput data integration Nov 29-30 2021 / PhenoDyn / Nathalie Vialaneix Rémi Servien p. 22