SlideShare a Scribd company logo
Random subspace with trees for feature selection
under memory constraints
Antonio Sutera
Dept. of EECS, University of Liège, Belgium
Benelearn 2016,
Kortrijk, Belgium
September 12, 2016
Pierre Geurts, Louis Wehenkel (ULg),
Gilles Louppe (CERN & NYU)
Célia Châtel (Luminy)
1 / 15
Background: Ensemble of randomized trees
 Good classification method
2 / 15
Background: Ensemble of randomized trees for feature
selection
 Good classification method useful for feature selection
𝜑1 𝜑 𝑀𝜑2
…
Importance of variable Xm for an ensemble of NT trees is given by:
Imp(Xm) =
1
NT
T t∈T:v(t)=Xm
p(t)∆i(t)
where p(t) = Nt /N and ∆i(t) is the impurity reduction at node t:
∆i(t) = i(t) −
NtL
Nt
i(tL) −
Ntr
Nt
i(tR )
Variable ranking by tree-based methods
feat1 feat2 . . . featm Class
191.63 -128.29 . . . -107.59 0
241.07 44.47 . . . 96.56 . . .
179.17 -3.69 . . . 56.67 0
. . . . . . . . . . . . 1
. . .
120.26 -30.47 . . . 42.81 1
⇓
0
5
10
15
20
25
30
f15 f4 f10 f8 f9 f20 f11 f1 f13 f2 f12 f14 f3 f16 f17 f6 f18 f19 f7 f5
%info
semble of randomized trees
3 / 15
Background: Feature relevance (Kohavi and John, 1997)
V
Irrelevant
features
Weakly
Strongly
Relevant
features
M
Given an output Y and a set of input variables V , X ∈ V is
relevant iff ∃B ⊆ V such that Y ⊥⊥ X|B.
irrelevant iff ∀B ⊆ V : Y ⊥⊥ X|B
strongly relevant iff Y ⊥⊥ X|V  {X}.
weakly relevant iff X is relevant and not strongly relevant.
A Markov boundary is a minimal size subset M ⊆ V such that
Y ⊥⊥ V  M|M.
4 / 15
Background: Feature selection (Nilsson et al., 2007)
V
Irrelevant
features
Weakly
Strongly
Relevant
features
M
Two different feature selection problems:
Minimal-optimal: find a Markov boundary for the output Y .
All-relevant: find all relevant features.
5 / 15
Random forests, variable importance and feature selection
Main results
In asymptotic conditions : infinite sample size and number of trees
K = 1: Unpruned totally randomized trees solve the all-relevant
feature selection problem.
K  1: In the case of stricly positive distributions, non random
trees always find a superset F of the minimal-optimal solution
which size decreases with K.
V
Irrelevant
features
Strongly
Relevant
features
F1F2
Fp
6 / 15
Motivation
Our objective: Design more efficient feature selection procedures
based on random forests
We address large-scale feature selection problems where one can
not assume that all variables can be stored into memory
We study and improve ensembles of trees grown from random
subsets of features
7 / 15
Random subspace for feature selection
Simplistic memory constrained setting: We can not grow trees with
more than q features
Straightforward ensemble solution: Random Subspace (RS)
Train each ensemble tree from a random subset of q features
1. Repeat T times:
1.1 Let Q be a subset of q features randomly selected in V
1.2 Grow a tree only using features in Q (with randomization K)
2. Compute importance Impq,T (X) for all X
Proposed e.g. by (Ho, 1998) for accuracy improvement, by (Louppe and
Geurts, 2012) for handling large datasets and by (Draminski et al., 2010,
Konukoglu and Ganz, 2014) for feature selection
Let us study the population version of this algorithm.
8 / 15
RS for feature selection: study
Asymptotic guarantees:
Def. deg(X) with X relevant is the size of the smallest B ⊆ V
such that Y ⊥⊥ X|B
K = 1: If deg(X)  q for all relevant variables X: X is relevant iff
Impq(X)  0
K ≥ 1: If there are q or less relevant variables: X strongly
relevant ⇒ Impq(X)  0
Drawback: RS requires many trees to find high degree variables
E.g.: p = 10000, q = 50, k = 1 ⇒
(p−k−1
q−k−1)
(p
q)
= 2.5 · 10−5
. In average, at least
T = 40812 trees are required to find X.
9 / 15
Sequential Random Subspace (SRS)
Proposed algorithm:
1. Let F = ∅
2. Repeat T times:
2.1 Let Q = R ∪ C, where:
R is a subset of min{αq, |F|} features randomly taken from F
C is a subset of q − |R| features randomly selected in V  R
2.2 Grow a tree only using features in Q
2.3 Add to F all features that get non-zero importance
3. Return F
↵q
F
Q
...
R C
V  F
Compared to RS: fill α% of the memory with previously found relevant
variables and (1 − α)% with randomly selected variables.
10 / 15
SRS for feature selection: study
Asymptotic guarantees: similar as RS if all relevant variables can fit
into memory.
Convergence: SRS requires much less trees than RS in most cases.
For example,
X1 X2 X3 X4 X5
Numerical simulation
11 / 15
Experiments: results in feature selection
Dataset: Madelon (Guyon et al., 2007)
1500 samples (|LS|=1000, |TS|=500)
500 features whose 20 relevant features (5 features that define Y , 5
random linear combinations of the first 5, and 10 noisy copies of the first 10)
0 500 1000 1500 2000 2500 3000 3500 4000
Number of iterations
0.2
0.0
0.2
0.4
0.6
0.8
1.0
F-measure
RS (alpha=0)
SRS (alpha=1.0)
Parameter:
q : 50
12 / 15
Experiments: results in prediction
0 2000 4000 6000 8000 10000
Number of iterations
0.50
0.55
0.60
0.65
0.70
0.75
0.80
0.85
0.90
Accuracy
RS (alpha=0)
SRS (alpha=1.0)
Parameter:
q : 50
After 10000
trees/iterations:
RF (K = max): 0.81
RF (K = q): 0.70
RS : 0.68
SRS: 0.84
13 / 15
Conclusions
Future works on SRS:
Good performance of SRS are confirmed on other datasets but
more experiments are needed.
How to dynamically adapt K and α to improve correctness and
convergence?
Parallelization of each step or of the global procedure
Conclusion:
In most cases, accumulating relevant features speeds up the
discovery of new relevant features while improving the accuracy.
14 / 15
References
Célia Châtel, Sélection de variables à grande échelle à partir de forêtes aléatoires,
Master’s thesis, École Centrale de Marseille/Université de Liège, 2015.
Gilles Louppe and Pierre Geurts, Ensembles on random patches., ECML/PKDD (1)
(Peter A. Flach, Tijl De Bie, and Nello Cristianini, eds.), Lecture Notes in Computer
Science, vol. 7523, Springer, 2012, pp. 346–361.
Gilles Louppe, Understanding random forests: From theory to practice, Ph.D. thesis,
University of Liège, 2014.
G. Louppe, L. Wehenkel, A. Sutera, and P. Geurts, Understanding variable
importances in forests of randomized trees, Advances in neural information
processing, 2013.
15 / 15
Variable importance scores
Some interpretability can be retrieved through variable importance
scores
0
5
10
15
20
25
30
f15 f4 f10 f8 f9 f20 f11 f1 f13 f2 f12 f14 f3 f16 f17 f6 f18 f19 f7 f5
%info
e.g. Sum of entropy reduction at each node where
appears.
Ensemble of randomized trees
I Improve standard classification and regression trees by reducing
their variance
I Many examples: Bagging (Breiman, 1996), Random Forests (Breiman,
2001), Extremely randomized trees (Geurts et al., 2006)
I Standard Random Forests: bootstrap sampling + random
selection of K features at each node
3 / 37
Two main importance measures:
The mean decrease of impurity (MDI): summing total impurity
reductions at all tree nodes where the variable appears (Breiman et
al., 1984)
The mean decrease of accuracy (MDA): measuring accuracy
reduction on out-of-bag samples when the values of the variable
are randomly permuted (Breiman, 2001)
These measures have found many successful applications such as:
Biomarker discovery
Gene regulatory network inference
(Huynh-Thu et al, Plos ONE, 2010 and Marbach et al., Nature Methods, 2012)
1 / 8
Mean decrease of impurity (MDI): definition
𝜑1 𝜑 𝑀𝜑2
…
Importance of variable Xm for an ensemble of NT trees is given by:
Imp(Xm) =
1
NT
T t∈T:v(t)=Xm
p(t)∆i(t)
where p(t) = Nt/N and ∆i(t) is the impurity reduction at node t:
∆i(t) = i(t) −
NtL
Nt
i(tL) −
Ntr
Nt
i(tR)
2 / 8
Link with common definitions of variable relevance
In asymptotic setting (N = NT = ∞)
K = 1: Variable importances depend only on the relevant variables
A variable Xm is relevant iff Imp(Xm)  0
The importance of a relevant variable is insensitive to the addition
or the removal of irrelevant variables in V .
⇒ Asymptotically, unpruned totally randomized trees thus solve the
all-relevant feature selection problem.
3 / 8
Link with common definitions of variable relevance
In asymptotic setting (N = NT = ∞)
K  1: Variable importances can be influenced by the number of
irrelevant variables and there can be relevant variables with zero
importances (due to masking effect)
But:
Xm irrelevant ⇒ Imp(Xm) = 0
Xm strongly relevant ⇒ Imp(Xm)  0
Strongly relevant features can not
be masked
V
Irrelevant
features
Strongly
Relevant
features
F1F2
Fp
⇒ In the case of stricly positive distributions, non random trees always
find a superset of the minimal-optimal solution which size decreases
with K.
4 / 8
Experiments: protocol
Madelon data (Guyon et al., 2007)
1500 samples (|LS|=1000, |TS|=500)
20 relevant features: 5 features that define Y , 5
random linear combinations of the first 5, and 10
noisy copies of the first 10
Increasing number of irrelevant features: 480, 1480,
2980, 5480
Parameters: q = 50, K = q, no bootstrap, threshold randomization
(Geurts et al., 2006)
Evaluation:
Average over 50 random LS/TS splits
Evolution of TS accuracy with number of iterations
Evolution of the area under the precision-recall curve (auprc) with
number of iterations, when features are ranked according to
importances
5 / 8
Experiments: results
Important improvement
of both auprc and
accuracy with SRS
The lower q/p, the
larger the improvement
Only SRS always
eventually perfectly
ranks the features
0 2000 4000 6000 8000 10000
iterations
0.2
0.0
0.2
0.4
0.6
0.8
1.0
1.2
auprc
auprc madelon
RS - 480 irr
RS - 1480 irr
RS - 2980 irr
RS - 5480 irr
SRS - 480 irr
SRS - 1480 irr
SRS - 2980 irr
SRS - 5480 irr
0 2000 4000 6000 8000 10000
iterations
0.45
0.50
0.55
0.60
0.65
0.70
0.75
0.80
0.85
0.90
accuracy
accuracy madelon
RS - 480 irr
RS - 1480 irr
RS - 2980 irr
RS - 5480 irr
SRS - 480 irr
SRS - 1480 irr
SRS - 2980 irr
SRS - 5480 irr
6 / 8
Experiments: results in feature selection
Dataset: TIS
13375 samples
927 features
0 500 1000 1500 2000 2500 3000 3500 4000
Number of iterations
0.2
0.0
0.2
0.4
0.6
F-measure
RS (alpha=0)
SRS (alpha=1.0)
Parameter:
q : 92
7 / 8
Experiments: results in prediction
0 2000 4000 6000 8000 10000
Number of iterations
0.80
0.82
0.84
0.86
0.88
0.90
0.92
0.94
Accuracy
RS (alpha=0)
SRS (alpha=1.0)
Parameter:
q : 92
After 10000
trees/iterations:
RF (K = max): 0.91
RF (K = q): 0.9
RS : 0.84
SRS: 0.91
8 / 8

More Related Content

What's hot

Random Matrix Theory and Machine Learning - Part 2
Random Matrix Theory and Machine Learning - Part 2Random Matrix Theory and Machine Learning - Part 2
Random Matrix Theory and Machine Learning - Part 2
Fabian Pedregosa
 
Random Matrix Theory and Machine Learning - Part 4
Random Matrix Theory and Machine Learning - Part 4Random Matrix Theory and Machine Learning - Part 4
Random Matrix Theory and Machine Learning - Part 4
Fabian Pedregosa
 
Random Matrix Theory and Machine Learning - Part 3
Random Matrix Theory and Machine Learning - Part 3Random Matrix Theory and Machine Learning - Part 3
Random Matrix Theory and Machine Learning - Part 3
Fabian Pedregosa
 
Discrete Probability Distributions
Discrete  Probability DistributionsDiscrete  Probability Distributions
Discrete Probability Distributions
E-tan
 
Target tracking suing multiple auxiliary particle filtering
Target tracking suing multiple auxiliary particle filteringTarget tracking suing multiple auxiliary particle filtering
Target tracking suing multiple auxiliary particle filtering
Luis Úbeda Medina
 
Random Matrix Theory and Machine Learning - Part 1
Random Matrix Theory and Machine Learning - Part 1Random Matrix Theory and Machine Learning - Part 1
Random Matrix Theory and Machine Learning - Part 1
Fabian Pedregosa
 
1 - Linear Regression
1 - Linear Regression1 - Linear Regression
1 - Linear Regression
Nikita Zhiltsov
 
MUMS Opening Workshop - Quantifying Nonparametric Modeling Uncertainty with B...
MUMS Opening Workshop - Quantifying Nonparametric Modeling Uncertainty with B...MUMS Opening Workshop - Quantifying Nonparametric Modeling Uncertainty with B...
MUMS Opening Workshop - Quantifying Nonparametric Modeling Uncertainty with B...
The Statistical and Applied Mathematical Sciences Institute
 
Newton Raphson iterative Method
Newton Raphson iterative MethodNewton Raphson iterative Method
Newton Raphson iterative Method
Isaac Yowetu
 
Dynamic response of structures with uncertain properties
Dynamic response of structures with uncertain propertiesDynamic response of structures with uncertain properties
Dynamic response of structures with uncertain properties
University of Glasgow
 
Conditional trees
Conditional treesConditional trees
Conditional trees
Christoph Molnar
 
Lecture on solving1
Lecture on solving1Lecture on solving1
Lecture on solving1
NBER
 
Random walks and diffusion on networks
Random walks and diffusion on networksRandom walks and diffusion on networks
Random walks and diffusion on networks
Naoki Masuda
 
A Gentle Introduction to the EM Algorithm
A Gentle Introduction to the EM AlgorithmA Gentle Introduction to the EM Algorithm
A Gentle Introduction to the EM Algorithm
University of Minnesota, Duluth
 
Introduction to Supervised ML Concepts and Algorithms
Introduction to Supervised ML Concepts and AlgorithmsIntroduction to Supervised ML Concepts and Algorithms
Introduction to Supervised ML Concepts and Algorithms
NBER
 
Exploring temporal graph data with Python: 
a study on tensor decomposition o...
Exploring temporal graph data with Python: 
a study on tensor decomposition o...Exploring temporal graph data with Python: 
a study on tensor decomposition o...
Exploring temporal graph data with Python: 
a study on tensor decomposition o...
André Panisson
 
Handling missing data with expectation maximization algorithm
Handling missing data with expectation maximization algorithmHandling missing data with expectation maximization algorithm
Handling missing data with expectation maximization algorithm
Loc Nguyen
 
Equivariant Imaging
Equivariant ImagingEquivariant Imaging
Equivariant Imaging
Julián Tachella
 
Parametric Density Estimation using Gaussian Mixture Models
Parametric Density Estimation using Gaussian Mixture ModelsParametric Density Estimation using Gaussian Mixture Models
Parametric Density Estimation using Gaussian Mixture Models
Pardis N
 
better together? statistical learning in models made of modules
better together? statistical learning in models made of modulesbetter together? statistical learning in models made of modules
better together? statistical learning in models made of modules
Christian Robert
 

What's hot (20)

Random Matrix Theory and Machine Learning - Part 2
Random Matrix Theory and Machine Learning - Part 2Random Matrix Theory and Machine Learning - Part 2
Random Matrix Theory and Machine Learning - Part 2
 
Random Matrix Theory and Machine Learning - Part 4
Random Matrix Theory and Machine Learning - Part 4Random Matrix Theory and Machine Learning - Part 4
Random Matrix Theory and Machine Learning - Part 4
 
Random Matrix Theory and Machine Learning - Part 3
Random Matrix Theory and Machine Learning - Part 3Random Matrix Theory and Machine Learning - Part 3
Random Matrix Theory and Machine Learning - Part 3
 
Discrete Probability Distributions
Discrete  Probability DistributionsDiscrete  Probability Distributions
Discrete Probability Distributions
 
Target tracking suing multiple auxiliary particle filtering
Target tracking suing multiple auxiliary particle filteringTarget tracking suing multiple auxiliary particle filtering
Target tracking suing multiple auxiliary particle filtering
 
Random Matrix Theory and Machine Learning - Part 1
Random Matrix Theory and Machine Learning - Part 1Random Matrix Theory and Machine Learning - Part 1
Random Matrix Theory and Machine Learning - Part 1
 
1 - Linear Regression
1 - Linear Regression1 - Linear Regression
1 - Linear Regression
 
MUMS Opening Workshop - Quantifying Nonparametric Modeling Uncertainty with B...
MUMS Opening Workshop - Quantifying Nonparametric Modeling Uncertainty with B...MUMS Opening Workshop - Quantifying Nonparametric Modeling Uncertainty with B...
MUMS Opening Workshop - Quantifying Nonparametric Modeling Uncertainty with B...
 
Newton Raphson iterative Method
Newton Raphson iterative MethodNewton Raphson iterative Method
Newton Raphson iterative Method
 
Dynamic response of structures with uncertain properties
Dynamic response of structures with uncertain propertiesDynamic response of structures with uncertain properties
Dynamic response of structures with uncertain properties
 
Conditional trees
Conditional treesConditional trees
Conditional trees
 
Lecture on solving1
Lecture on solving1Lecture on solving1
Lecture on solving1
 
Random walks and diffusion on networks
Random walks and diffusion on networksRandom walks and diffusion on networks
Random walks and diffusion on networks
 
A Gentle Introduction to the EM Algorithm
A Gentle Introduction to the EM AlgorithmA Gentle Introduction to the EM Algorithm
A Gentle Introduction to the EM Algorithm
 
Introduction to Supervised ML Concepts and Algorithms
Introduction to Supervised ML Concepts and AlgorithmsIntroduction to Supervised ML Concepts and Algorithms
Introduction to Supervised ML Concepts and Algorithms
 
Exploring temporal graph data with Python: 
a study on tensor decomposition o...
Exploring temporal graph data with Python: 
a study on tensor decomposition o...Exploring temporal graph data with Python: 
a study on tensor decomposition o...
Exploring temporal graph data with Python: 
a study on tensor decomposition o...
 
Handling missing data with expectation maximization algorithm
Handling missing data with expectation maximization algorithmHandling missing data with expectation maximization algorithm
Handling missing data with expectation maximization algorithm
 
Equivariant Imaging
Equivariant ImagingEquivariant Imaging
Equivariant Imaging
 
Parametric Density Estimation using Gaussian Mixture Models
Parametric Density Estimation using Gaussian Mixture ModelsParametric Density Estimation using Gaussian Mixture Models
Parametric Density Estimation using Gaussian Mixture Models
 
better together? statistical learning in models made of modules
better together? statistical learning in models made of modulesbetter together? statistical learning in models made of modules
better together? statistical learning in models made of modules
 

Similar to Benelearn2016

Tree models with Scikit-Learn: Great models with little assumptions
Tree models with Scikit-Learn: Great models with little assumptionsTree models with Scikit-Learn: Great models with little assumptions
Tree models with Scikit-Learn: Great models with little assumptions
Gilles Louppe
 
Jörg Stelzer
Jörg StelzerJörg Stelzer
Jörg Stelzer
butest
 
MUMS: Bayesian, Fiducial, and Frequentist Conference - Multidimensional Monot...
MUMS: Bayesian, Fiducial, and Frequentist Conference - Multidimensional Monot...MUMS: Bayesian, Fiducial, and Frequentist Conference - Multidimensional Monot...
MUMS: Bayesian, Fiducial, and Frequentist Conference - Multidimensional Monot...
The Statistical and Applied Mathematical Sciences Institute
 
Molinier - Feature Selection for Tree Species Identification in Very High res...
Molinier - Feature Selection for Tree Species Identification in Very High res...Molinier - Feature Selection for Tree Species Identification in Very High res...
Molinier - Feature Selection for Tree Species Identification in Very High res...
grssieee
 
Dictionary Learning for Massive Matrix Factorization
Dictionary Learning for Massive Matrix FactorizationDictionary Learning for Massive Matrix Factorization
Dictionary Learning for Massive Matrix Factorization
Arthur Mensch
 
Vladimir Milov and Andrey Savchenko - Classification of Dangerous Situations...
Vladimir Milov and  Andrey Savchenko - Classification of Dangerous Situations...Vladimir Milov and  Andrey Savchenko - Classification of Dangerous Situations...
Vladimir Milov and Andrey Savchenko - Classification of Dangerous Situations...
AIST
 
Rank awarealgs small11
Rank awarealgs small11Rank awarealgs small11
Rank awarealgs small11
Jules Esp
 
Rank awarealgs small11
Rank awarealgs small11Rank awarealgs small11
Rank awarealgs small11
Jules Esp
 
Decomposition and Denoising for moment sequences using convex optimization
Decomposition and Denoising for moment sequences using convex optimizationDecomposition and Denoising for moment sequences using convex optimization
Decomposition and Denoising for moment sequences using convex optimization
Badri Narayan Bhaskar
 
Lecture 02: Machine Learning for Language Technology - Decision Trees and Nea...
Lecture 02: Machine Learning for Language Technology - Decision Trees and Nea...Lecture 02: Machine Learning for Language Technology - Decision Trees and Nea...
Lecture 02: Machine Learning for Language Technology - Decision Trees and Nea...
Marina Santini
 
Chapter-4 combined.pptx
Chapter-4 combined.pptxChapter-4 combined.pptx
Chapter-4 combined.pptx
HamzaHaji6
 
Random Forests for Laughter Detection
Random Forests for Laughter DetectionRandom Forests for Laughter Detection
Random Forests for Laughter Detection
Heysem Kaya
 
L1-based compression of random forest modelSlide
L1-based compression of random forest modelSlideL1-based compression of random forest modelSlide
L1-based compression of random forest modelSlide
Arnaud Joly
 
Introduction
IntroductionIntroduction
Introduction
butest
 
Decision Tree
Decision Tree Decision Tree
Decision Tree
Konkuk University, Korea
 
Optimization tutorial
Optimization tutorialOptimization tutorial
Optimization tutorial
Northwestern University
 
Smart Multitask Bregman Clustering
Smart Multitask Bregman ClusteringSmart Multitask Bregman Clustering
Smart Multitask Bregman Clustering
Venkat Sai Sharath Mudhigonda
 
Minimax optimal alternating minimization \\ for kernel nonparametric tensor l...
Minimax optimal alternating minimization \\ for kernel nonparametric tensor l...Minimax optimal alternating minimization \\ for kernel nonparametric tensor l...
Minimax optimal alternating minimization \\ for kernel nonparametric tensor l...
Taiji Suzuki
 
Tree net and_randomforests_2009
Tree net and_randomforests_2009Tree net and_randomforests_2009
Tree net and_randomforests_2009
Matthew Magistrado
 
slides-defense-jie
slides-defense-jieslides-defense-jie
slides-defense-jie
jie ren
 

Similar to Benelearn2016 (20)

Tree models with Scikit-Learn: Great models with little assumptions
Tree models with Scikit-Learn: Great models with little assumptionsTree models with Scikit-Learn: Great models with little assumptions
Tree models with Scikit-Learn: Great models with little assumptions
 
Jörg Stelzer
Jörg StelzerJörg Stelzer
Jörg Stelzer
 
MUMS: Bayesian, Fiducial, and Frequentist Conference - Multidimensional Monot...
MUMS: Bayesian, Fiducial, and Frequentist Conference - Multidimensional Monot...MUMS: Bayesian, Fiducial, and Frequentist Conference - Multidimensional Monot...
MUMS: Bayesian, Fiducial, and Frequentist Conference - Multidimensional Monot...
 
Molinier - Feature Selection for Tree Species Identification in Very High res...
Molinier - Feature Selection for Tree Species Identification in Very High res...Molinier - Feature Selection for Tree Species Identification in Very High res...
Molinier - Feature Selection for Tree Species Identification in Very High res...
 
Dictionary Learning for Massive Matrix Factorization
Dictionary Learning for Massive Matrix FactorizationDictionary Learning for Massive Matrix Factorization
Dictionary Learning for Massive Matrix Factorization
 
Vladimir Milov and Andrey Savchenko - Classification of Dangerous Situations...
Vladimir Milov and  Andrey Savchenko - Classification of Dangerous Situations...Vladimir Milov and  Andrey Savchenko - Classification of Dangerous Situations...
Vladimir Milov and Andrey Savchenko - Classification of Dangerous Situations...
 
Rank awarealgs small11
Rank awarealgs small11Rank awarealgs small11
Rank awarealgs small11
 
Rank awarealgs small11
Rank awarealgs small11Rank awarealgs small11
Rank awarealgs small11
 
Decomposition and Denoising for moment sequences using convex optimization
Decomposition and Denoising for moment sequences using convex optimizationDecomposition and Denoising for moment sequences using convex optimization
Decomposition and Denoising for moment sequences using convex optimization
 
Lecture 02: Machine Learning for Language Technology - Decision Trees and Nea...
Lecture 02: Machine Learning for Language Technology - Decision Trees and Nea...Lecture 02: Machine Learning for Language Technology - Decision Trees and Nea...
Lecture 02: Machine Learning for Language Technology - Decision Trees and Nea...
 
Chapter-4 combined.pptx
Chapter-4 combined.pptxChapter-4 combined.pptx
Chapter-4 combined.pptx
 
Random Forests for Laughter Detection
Random Forests for Laughter DetectionRandom Forests for Laughter Detection
Random Forests for Laughter Detection
 
L1-based compression of random forest modelSlide
L1-based compression of random forest modelSlideL1-based compression of random forest modelSlide
L1-based compression of random forest modelSlide
 
Introduction
IntroductionIntroduction
Introduction
 
Decision Tree
Decision Tree Decision Tree
Decision Tree
 
Optimization tutorial
Optimization tutorialOptimization tutorial
Optimization tutorial
 
Smart Multitask Bregman Clustering
Smart Multitask Bregman ClusteringSmart Multitask Bregman Clustering
Smart Multitask Bregman Clustering
 
Minimax optimal alternating minimization \\ for kernel nonparametric tensor l...
Minimax optimal alternating minimization \\ for kernel nonparametric tensor l...Minimax optimal alternating minimization \\ for kernel nonparametric tensor l...
Minimax optimal alternating minimization \\ for kernel nonparametric tensor l...
 
Tree net and_randomforests_2009
Tree net and_randomforests_2009Tree net and_randomforests_2009
Tree net and_randomforests_2009
 
slides-defense-jie
slides-defense-jieslides-defense-jie
slides-defense-jie
 

Recently uploaded

Build applications with generative AI on Google Cloud
Build applications with generative AI on Google CloudBuild applications with generative AI on Google Cloud
Build applications with generative AI on Google Cloud
Márton Kodok
 
一比一原版(UO毕业证)渥太华大学毕业证如何办理
一比一原版(UO毕业证)渥太华大学毕业证如何办理一比一原版(UO毕业证)渥太华大学毕业证如何办理
一比一原版(UO毕业证)渥太华大学毕业证如何办理
bmucuha
 
一比一原版(UCSB文凭证书)圣芭芭拉分校毕业证如何办理
一比一原版(UCSB文凭证书)圣芭芭拉分校毕业证如何办理一比一原版(UCSB文凭证书)圣芭芭拉分校毕业证如何办理
一比一原版(UCSB文凭证书)圣芭芭拉分校毕业证如何办理
nuttdpt
 
在线办理(英国UCA毕业证书)创意艺术大学毕业证在读证明一模一样
在线办理(英国UCA毕业证书)创意艺术大学毕业证在读证明一模一样在线办理(英国UCA毕业证书)创意艺术大学毕业证在读证明一模一样
在线办理(英国UCA毕业证书)创意艺术大学毕业证在读证明一模一样
v7oacc3l
 
4th Modern Marketing Reckoner by MMA Global India & Group M: 60+ experts on W...
4th Modern Marketing Reckoner by MMA Global India & Group M: 60+ experts on W...4th Modern Marketing Reckoner by MMA Global India & Group M: 60+ experts on W...
4th Modern Marketing Reckoner by MMA Global India & Group M: 60+ experts on W...
Social Samosa
 
一比一原版(曼大毕业证书)曼尼托巴大学毕业证如何办理
一比一原版(曼大毕业证书)曼尼托巴大学毕业证如何办理一比一原版(曼大毕业证书)曼尼托巴大学毕业证如何办理
一比一原版(曼大毕业证书)曼尼托巴大学毕业证如何办理
ytypuem
 
一比一原版(Sheffield毕业证书)谢菲尔德大学毕业证如何办理
一比一原版(Sheffield毕业证书)谢菲尔德大学毕业证如何办理一比一原版(Sheffield毕业证书)谢菲尔德大学毕业证如何办理
一比一原版(Sheffield毕业证书)谢菲尔德大学毕业证如何办理
1tyxnjpia
 
UofT毕业证如何办理
UofT毕业证如何办理UofT毕业证如何办理
UofT毕业证如何办理
exukyp
 
A presentation that explain the Power BI Licensing
A presentation that explain the Power BI LicensingA presentation that explain the Power BI Licensing
A presentation that explain the Power BI Licensing
AlessioFois2
 
一比一原版(lbs毕业证书)伦敦商学院毕业证如何办理
一比一原版(lbs毕业证书)伦敦商学院毕业证如何办理一比一原版(lbs毕业证书)伦敦商学院毕业证如何办理
一比一原版(lbs毕业证书)伦敦商学院毕业证如何办理
ywqeos
 
一比一原版(harvard毕业证书)哈佛大学毕业证如何办理
一比一原版(harvard毕业证书)哈佛大学毕业证如何办理一比一原版(harvard毕业证书)哈佛大学毕业证如何办理
一比一原版(harvard毕业证书)哈佛大学毕业证如何办理
taqyea
 
一比一原版兰加拉学院毕业证(Langara毕业证书)学历如何办理
一比一原版兰加拉学院毕业证(Langara毕业证书)学历如何办理一比一原版兰加拉学院毕业证(Langara毕业证书)学历如何办理
一比一原版兰加拉学院毕业证(Langara毕业证书)学历如何办理
hyfjgavov
 
一比一原版(UMN文凭证书)明尼苏达大学毕业证如何办理
一比一原版(UMN文凭证书)明尼苏达大学毕业证如何办理一比一原版(UMN文凭证书)明尼苏达大学毕业证如何办理
一比一原版(UMN文凭证书)明尼苏达大学毕业证如何办理
nyfuhyz
 
University of New South Wales degree offer diploma Transcript
University of New South Wales degree offer diploma TranscriptUniversity of New South Wales degree offer diploma Transcript
University of New South Wales degree offer diploma Transcript
soxrziqu
 
一比一原版(UCSF文凭证书)旧金山分校毕业证如何办理
一比一原版(UCSF文凭证书)旧金山分校毕业证如何办理一比一原版(UCSF文凭证书)旧金山分校毕业证如何办理
一比一原版(UCSF文凭证书)旧金山分校毕业证如何办理
nuttdpt
 
原版一比一多伦多大学毕业证(UofT毕业证书)如何办理
原版一比一多伦多大学毕业证(UofT毕业证书)如何办理原版一比一多伦多大学毕业证(UofT毕业证书)如何办理
原版一比一多伦多大学毕业证(UofT毕业证书)如何办理
mkkikqvo
 
一比一原版美国帕森斯设计学院毕业证(parsons毕业证书)如何办理
一比一原版美国帕森斯设计学院毕业证(parsons毕业证书)如何办理一比一原版美国帕森斯设计学院毕业证(parsons毕业证书)如何办理
一比一原版美国帕森斯设计学院毕业证(parsons毕业证书)如何办理
asyed10
 
Open Source Contributions to Postgres: The Basics POSETTE 2024
Open Source Contributions to Postgres: The Basics POSETTE 2024Open Source Contributions to Postgres: The Basics POSETTE 2024
Open Source Contributions to Postgres: The Basics POSETTE 2024
ElizabethGarrettChri
 
一比一原版格里菲斯大学毕业证(Griffith毕业证书)学历如何办理
一比一原版格里菲斯大学毕业证(Griffith毕业证书)学历如何办理一比一原版格里菲斯大学毕业证(Griffith毕业证书)学历如何办理
一比一原版格里菲斯大学毕业证(Griffith毕业证书)学历如何办理
lzdvtmy8
 
一比一原版英属哥伦比亚大学毕业证(UBC毕业证书)学历如何办理
一比一原版英属哥伦比亚大学毕业证(UBC毕业证书)学历如何办理一比一原版英属哥伦比亚大学毕业证(UBC毕业证书)学历如何办理
一比一原版英属哥伦比亚大学毕业证(UBC毕业证书)学历如何办理
z6osjkqvd
 

Recently uploaded (20)

Build applications with generative AI on Google Cloud
Build applications with generative AI on Google CloudBuild applications with generative AI on Google Cloud
Build applications with generative AI on Google Cloud
 
一比一原版(UO毕业证)渥太华大学毕业证如何办理
一比一原版(UO毕业证)渥太华大学毕业证如何办理一比一原版(UO毕业证)渥太华大学毕业证如何办理
一比一原版(UO毕业证)渥太华大学毕业证如何办理
 
一比一原版(UCSB文凭证书)圣芭芭拉分校毕业证如何办理
一比一原版(UCSB文凭证书)圣芭芭拉分校毕业证如何办理一比一原版(UCSB文凭证书)圣芭芭拉分校毕业证如何办理
一比一原版(UCSB文凭证书)圣芭芭拉分校毕业证如何办理
 
在线办理(英国UCA毕业证书)创意艺术大学毕业证在读证明一模一样
在线办理(英国UCA毕业证书)创意艺术大学毕业证在读证明一模一样在线办理(英国UCA毕业证书)创意艺术大学毕业证在读证明一模一样
在线办理(英国UCA毕业证书)创意艺术大学毕业证在读证明一模一样
 
4th Modern Marketing Reckoner by MMA Global India & Group M: 60+ experts on W...
4th Modern Marketing Reckoner by MMA Global India & Group M: 60+ experts on W...4th Modern Marketing Reckoner by MMA Global India & Group M: 60+ experts on W...
4th Modern Marketing Reckoner by MMA Global India & Group M: 60+ experts on W...
 
一比一原版(曼大毕业证书)曼尼托巴大学毕业证如何办理
一比一原版(曼大毕业证书)曼尼托巴大学毕业证如何办理一比一原版(曼大毕业证书)曼尼托巴大学毕业证如何办理
一比一原版(曼大毕业证书)曼尼托巴大学毕业证如何办理
 
一比一原版(Sheffield毕业证书)谢菲尔德大学毕业证如何办理
一比一原版(Sheffield毕业证书)谢菲尔德大学毕业证如何办理一比一原版(Sheffield毕业证书)谢菲尔德大学毕业证如何办理
一比一原版(Sheffield毕业证书)谢菲尔德大学毕业证如何办理
 
UofT毕业证如何办理
UofT毕业证如何办理UofT毕业证如何办理
UofT毕业证如何办理
 
A presentation that explain the Power BI Licensing
A presentation that explain the Power BI LicensingA presentation that explain the Power BI Licensing
A presentation that explain the Power BI Licensing
 
一比一原版(lbs毕业证书)伦敦商学院毕业证如何办理
一比一原版(lbs毕业证书)伦敦商学院毕业证如何办理一比一原版(lbs毕业证书)伦敦商学院毕业证如何办理
一比一原版(lbs毕业证书)伦敦商学院毕业证如何办理
 
一比一原版(harvard毕业证书)哈佛大学毕业证如何办理
一比一原版(harvard毕业证书)哈佛大学毕业证如何办理一比一原版(harvard毕业证书)哈佛大学毕业证如何办理
一比一原版(harvard毕业证书)哈佛大学毕业证如何办理
 
一比一原版兰加拉学院毕业证(Langara毕业证书)学历如何办理
一比一原版兰加拉学院毕业证(Langara毕业证书)学历如何办理一比一原版兰加拉学院毕业证(Langara毕业证书)学历如何办理
一比一原版兰加拉学院毕业证(Langara毕业证书)学历如何办理
 
一比一原版(UMN文凭证书)明尼苏达大学毕业证如何办理
一比一原版(UMN文凭证书)明尼苏达大学毕业证如何办理一比一原版(UMN文凭证书)明尼苏达大学毕业证如何办理
一比一原版(UMN文凭证书)明尼苏达大学毕业证如何办理
 
University of New South Wales degree offer diploma Transcript
University of New South Wales degree offer diploma TranscriptUniversity of New South Wales degree offer diploma Transcript
University of New South Wales degree offer diploma Transcript
 
一比一原版(UCSF文凭证书)旧金山分校毕业证如何办理
一比一原版(UCSF文凭证书)旧金山分校毕业证如何办理一比一原版(UCSF文凭证书)旧金山分校毕业证如何办理
一比一原版(UCSF文凭证书)旧金山分校毕业证如何办理
 
原版一比一多伦多大学毕业证(UofT毕业证书)如何办理
原版一比一多伦多大学毕业证(UofT毕业证书)如何办理原版一比一多伦多大学毕业证(UofT毕业证书)如何办理
原版一比一多伦多大学毕业证(UofT毕业证书)如何办理
 
一比一原版美国帕森斯设计学院毕业证(parsons毕业证书)如何办理
一比一原版美国帕森斯设计学院毕业证(parsons毕业证书)如何办理一比一原版美国帕森斯设计学院毕业证(parsons毕业证书)如何办理
一比一原版美国帕森斯设计学院毕业证(parsons毕业证书)如何办理
 
Open Source Contributions to Postgres: The Basics POSETTE 2024
Open Source Contributions to Postgres: The Basics POSETTE 2024Open Source Contributions to Postgres: The Basics POSETTE 2024
Open Source Contributions to Postgres: The Basics POSETTE 2024
 
一比一原版格里菲斯大学毕业证(Griffith毕业证书)学历如何办理
一比一原版格里菲斯大学毕业证(Griffith毕业证书)学历如何办理一比一原版格里菲斯大学毕业证(Griffith毕业证书)学历如何办理
一比一原版格里菲斯大学毕业证(Griffith毕业证书)学历如何办理
 
一比一原版英属哥伦比亚大学毕业证(UBC毕业证书)学历如何办理
一比一原版英属哥伦比亚大学毕业证(UBC毕业证书)学历如何办理一比一原版英属哥伦比亚大学毕业证(UBC毕业证书)学历如何办理
一比一原版英属哥伦比亚大学毕业证(UBC毕业证书)学历如何办理
 

Benelearn2016

  • 1. Random subspace with trees for feature selection under memory constraints Antonio Sutera Dept. of EECS, University of Liège, Belgium Benelearn 2016, Kortrijk, Belgium September 12, 2016 Pierre Geurts, Louis Wehenkel (ULg), Gilles Louppe (CERN & NYU) Célia Châtel (Luminy) 1 / 15
  • 2. Background: Ensemble of randomized trees Good classification method 2 / 15
  • 3. Background: Ensemble of randomized trees for feature selection Good classification method useful for feature selection 𝜑1 𝜑 𝑀𝜑2 … Importance of variable Xm for an ensemble of NT trees is given by: Imp(Xm) = 1 NT T t∈T:v(t)=Xm p(t)∆i(t) where p(t) = Nt /N and ∆i(t) is the impurity reduction at node t: ∆i(t) = i(t) − NtL Nt i(tL) − Ntr Nt i(tR ) Variable ranking by tree-based methods feat1 feat2 . . . featm Class 191.63 -128.29 . . . -107.59 0 241.07 44.47 . . . 96.56 . . . 179.17 -3.69 . . . 56.67 0 . . . . . . . . . . . . 1 . . . 120.26 -30.47 . . . 42.81 1 ⇓ 0 5 10 15 20 25 30 f15 f4 f10 f8 f9 f20 f11 f1 f13 f2 f12 f14 f3 f16 f17 f6 f18 f19 f7 f5 %info semble of randomized trees 3 / 15
  • 4. Background: Feature relevance (Kohavi and John, 1997) V Irrelevant features Weakly Strongly Relevant features M Given an output Y and a set of input variables V , X ∈ V is relevant iff ∃B ⊆ V such that Y ⊥⊥ X|B. irrelevant iff ∀B ⊆ V : Y ⊥⊥ X|B strongly relevant iff Y ⊥⊥ X|V {X}. weakly relevant iff X is relevant and not strongly relevant. A Markov boundary is a minimal size subset M ⊆ V such that Y ⊥⊥ V M|M. 4 / 15
  • 5. Background: Feature selection (Nilsson et al., 2007) V Irrelevant features Weakly Strongly Relevant features M Two different feature selection problems: Minimal-optimal: find a Markov boundary for the output Y . All-relevant: find all relevant features. 5 / 15
  • 6. Random forests, variable importance and feature selection Main results In asymptotic conditions : infinite sample size and number of trees K = 1: Unpruned totally randomized trees solve the all-relevant feature selection problem. K 1: In the case of stricly positive distributions, non random trees always find a superset F of the minimal-optimal solution which size decreases with K. V Irrelevant features Strongly Relevant features F1F2 Fp 6 / 15
  • 7. Motivation Our objective: Design more efficient feature selection procedures based on random forests We address large-scale feature selection problems where one can not assume that all variables can be stored into memory We study and improve ensembles of trees grown from random subsets of features 7 / 15
  • 8. Random subspace for feature selection Simplistic memory constrained setting: We can not grow trees with more than q features Straightforward ensemble solution: Random Subspace (RS) Train each ensemble tree from a random subset of q features 1. Repeat T times: 1.1 Let Q be a subset of q features randomly selected in V 1.2 Grow a tree only using features in Q (with randomization K) 2. Compute importance Impq,T (X) for all X Proposed e.g. by (Ho, 1998) for accuracy improvement, by (Louppe and Geurts, 2012) for handling large datasets and by (Draminski et al., 2010, Konukoglu and Ganz, 2014) for feature selection Let us study the population version of this algorithm. 8 / 15
  • 9. RS for feature selection: study Asymptotic guarantees: Def. deg(X) with X relevant is the size of the smallest B ⊆ V such that Y ⊥⊥ X|B K = 1: If deg(X) q for all relevant variables X: X is relevant iff Impq(X) 0 K ≥ 1: If there are q or less relevant variables: X strongly relevant ⇒ Impq(X) 0 Drawback: RS requires many trees to find high degree variables E.g.: p = 10000, q = 50, k = 1 ⇒ (p−k−1 q−k−1) (p q) = 2.5 · 10−5 . In average, at least T = 40812 trees are required to find X. 9 / 15
  • 10. Sequential Random Subspace (SRS) Proposed algorithm: 1. Let F = ∅ 2. Repeat T times: 2.1 Let Q = R ∪ C, where: R is a subset of min{αq, |F|} features randomly taken from F C is a subset of q − |R| features randomly selected in V R 2.2 Grow a tree only using features in Q 2.3 Add to F all features that get non-zero importance 3. Return F ↵q F Q ... R C V F Compared to RS: fill α% of the memory with previously found relevant variables and (1 − α)% with randomly selected variables. 10 / 15
  • 11. SRS for feature selection: study Asymptotic guarantees: similar as RS if all relevant variables can fit into memory. Convergence: SRS requires much less trees than RS in most cases. For example, X1 X2 X3 X4 X5 Numerical simulation 11 / 15
  • 12. Experiments: results in feature selection Dataset: Madelon (Guyon et al., 2007) 1500 samples (|LS|=1000, |TS|=500) 500 features whose 20 relevant features (5 features that define Y , 5 random linear combinations of the first 5, and 10 noisy copies of the first 10) 0 500 1000 1500 2000 2500 3000 3500 4000 Number of iterations 0.2 0.0 0.2 0.4 0.6 0.8 1.0 F-measure RS (alpha=0) SRS (alpha=1.0) Parameter: q : 50 12 / 15
  • 13. Experiments: results in prediction 0 2000 4000 6000 8000 10000 Number of iterations 0.50 0.55 0.60 0.65 0.70 0.75 0.80 0.85 0.90 Accuracy RS (alpha=0) SRS (alpha=1.0) Parameter: q : 50 After 10000 trees/iterations: RF (K = max): 0.81 RF (K = q): 0.70 RS : 0.68 SRS: 0.84 13 / 15
  • 14. Conclusions Future works on SRS: Good performance of SRS are confirmed on other datasets but more experiments are needed. How to dynamically adapt K and α to improve correctness and convergence? Parallelization of each step or of the global procedure Conclusion: In most cases, accumulating relevant features speeds up the discovery of new relevant features while improving the accuracy. 14 / 15
  • 15. References Célia Châtel, Sélection de variables à grande échelle à partir de forêtes aléatoires, Master’s thesis, École Centrale de Marseille/Université de Liège, 2015. Gilles Louppe and Pierre Geurts, Ensembles on random patches., ECML/PKDD (1) (Peter A. Flach, Tijl De Bie, and Nello Cristianini, eds.), Lecture Notes in Computer Science, vol. 7523, Springer, 2012, pp. 346–361. Gilles Louppe, Understanding random forests: From theory to practice, Ph.D. thesis, University of Liège, 2014. G. Louppe, L. Wehenkel, A. Sutera, and P. Geurts, Understanding variable importances in forests of randomized trees, Advances in neural information processing, 2013. 15 / 15
  • 16. Variable importance scores Some interpretability can be retrieved through variable importance scores 0 5 10 15 20 25 30 f15 f4 f10 f8 f9 f20 f11 f1 f13 f2 f12 f14 f3 f16 f17 f6 f18 f19 f7 f5 %info e.g. Sum of entropy reduction at each node where appears. Ensemble of randomized trees I Improve standard classification and regression trees by reducing their variance I Many examples: Bagging (Breiman, 1996), Random Forests (Breiman, 2001), Extremely randomized trees (Geurts et al., 2006) I Standard Random Forests: bootstrap sampling + random selection of K features at each node 3 / 37 Two main importance measures: The mean decrease of impurity (MDI): summing total impurity reductions at all tree nodes where the variable appears (Breiman et al., 1984) The mean decrease of accuracy (MDA): measuring accuracy reduction on out-of-bag samples when the values of the variable are randomly permuted (Breiman, 2001) These measures have found many successful applications such as: Biomarker discovery Gene regulatory network inference (Huynh-Thu et al, Plos ONE, 2010 and Marbach et al., Nature Methods, 2012) 1 / 8
  • 17. Mean decrease of impurity (MDI): definition 𝜑1 𝜑 𝑀𝜑2 … Importance of variable Xm for an ensemble of NT trees is given by: Imp(Xm) = 1 NT T t∈T:v(t)=Xm p(t)∆i(t) where p(t) = Nt/N and ∆i(t) is the impurity reduction at node t: ∆i(t) = i(t) − NtL Nt i(tL) − Ntr Nt i(tR) 2 / 8
  • 18. Link with common definitions of variable relevance In asymptotic setting (N = NT = ∞) K = 1: Variable importances depend only on the relevant variables A variable Xm is relevant iff Imp(Xm) 0 The importance of a relevant variable is insensitive to the addition or the removal of irrelevant variables in V . ⇒ Asymptotically, unpruned totally randomized trees thus solve the all-relevant feature selection problem. 3 / 8
  • 19. Link with common definitions of variable relevance In asymptotic setting (N = NT = ∞) K 1: Variable importances can be influenced by the number of irrelevant variables and there can be relevant variables with zero importances (due to masking effect) But: Xm irrelevant ⇒ Imp(Xm) = 0 Xm strongly relevant ⇒ Imp(Xm) 0 Strongly relevant features can not be masked V Irrelevant features Strongly Relevant features F1F2 Fp ⇒ In the case of stricly positive distributions, non random trees always find a superset of the minimal-optimal solution which size decreases with K. 4 / 8
  • 20. Experiments: protocol Madelon data (Guyon et al., 2007) 1500 samples (|LS|=1000, |TS|=500) 20 relevant features: 5 features that define Y , 5 random linear combinations of the first 5, and 10 noisy copies of the first 10 Increasing number of irrelevant features: 480, 1480, 2980, 5480 Parameters: q = 50, K = q, no bootstrap, threshold randomization (Geurts et al., 2006) Evaluation: Average over 50 random LS/TS splits Evolution of TS accuracy with number of iterations Evolution of the area under the precision-recall curve (auprc) with number of iterations, when features are ranked according to importances 5 / 8
  • 21. Experiments: results Important improvement of both auprc and accuracy with SRS The lower q/p, the larger the improvement Only SRS always eventually perfectly ranks the features 0 2000 4000 6000 8000 10000 iterations 0.2 0.0 0.2 0.4 0.6 0.8 1.0 1.2 auprc auprc madelon RS - 480 irr RS - 1480 irr RS - 2980 irr RS - 5480 irr SRS - 480 irr SRS - 1480 irr SRS - 2980 irr SRS - 5480 irr 0 2000 4000 6000 8000 10000 iterations 0.45 0.50 0.55 0.60 0.65 0.70 0.75 0.80 0.85 0.90 accuracy accuracy madelon RS - 480 irr RS - 1480 irr RS - 2980 irr RS - 5480 irr SRS - 480 irr SRS - 1480 irr SRS - 2980 irr SRS - 5480 irr 6 / 8
  • 22. Experiments: results in feature selection Dataset: TIS 13375 samples 927 features 0 500 1000 1500 2000 2500 3000 3500 4000 Number of iterations 0.2 0.0 0.2 0.4 0.6 F-measure RS (alpha=0) SRS (alpha=1.0) Parameter: q : 92 7 / 8
  • 23. Experiments: results in prediction 0 2000 4000 6000 8000 10000 Number of iterations 0.80 0.82 0.84 0.86 0.88 0.90 0.92 0.94 Accuracy RS (alpha=0) SRS (alpha=1.0) Parameter: q : 92 After 10000 trees/iterations: RF (K = max): 0.91 RF (K = q): 0.9 RS : 0.84 SRS: 0.91 8 / 8