Traditional feature selection algorithms are often motivated by the objective of obtaining an optimal list of variables for classification or regression problem. An alternative is the exploratory approach, where all the features that may contribute to explaining a decision vector are reported, including an explic-it list of discovered feature interactions. Because of their simplicity, versatility and ability to handle mixed categorical and unnormalized numerical input, deci-sion tree based ensemble methods are a powerful feature selection tool. Furthermore, the very structure of the trained decision trees can provide hints about the feature interdependency. In this paper, capability of detecting strong synthetic benchmark feature interactions in a set of mixed categorical and continuous variables is demonstrated using C4.5 Decision Trees, RandomForests and Extremely Randomized Decision Trees following the feature selection methodology of Monte Carlo Feature Selection algorithm. MCFS’s original way of detecting feature interactions relying on the structure of the trees is compared with our modified approach relying on a series of variable permutations. The new approach is slightly more robust and more flexible, as it allows to plug in different types of classifiers or even regressors to MCFS. Our recommendation for researchers using decision tree based methods for mixed categorical and con-tinuous datasets is to use heuristics relying purely on features impact on performance of the classifier rather than its structure.
Data Analytics for Digital Marketing Lecture for Advanced Digital & Social Me...
Multidimensional Feature Selection and Interaction Mining with Decision Tree based ensemble methods
1. Multidimensional Feature Selection and Interaction Mining
with Decision Tree based ensemble methods
Łukasz Król, Joanna Polańska
Data Mining Group
Faculty of Automatic Control,
Electronics and Computer Science
Silesian University of Technology
2. Feature Selection – supervised or unsupervised?
MACHINE
LEARNING
SUPERVISED
AUTOMATION
+feature selection
UNDERSTANDING
THE PROCESS
+feature selection
UNSUPERVISED
+feature selection
3. Explorative Supervised Feature Selection
MACHINE
LEARNING
SUPERVISED
AUTOMATION
+feature selection
UNDERSTANDING
THE PROCESS
+feature selection
UNSUPERVISED
+feature selection
5. Explorative Supervised Feature Selection
platform observations features
PCR 102-103 101-102
RNA microarrays 102-103 104
RNA sequencing 102-103 105-106
SNP microarrays 102-103 105-106
CNV microarrays 102-103 106
methylation sites 102-103 108-109
full genome 102-103 109
mixed data 102-103 101-109
6. Explorative Supervised Feature Selection
Common requirements:
• Handles high-dimensional mixed-input data.
• Considers feature interactions.
• Not bound to a greedy search path.
• Agnostic of type of variables and number of categories.
• Does not transform the feature space.
• A broad range of problems (types of decision vectors):
• categorical
• continuous
• censored survival time
7. Monte Carlo Feature Selection
Bioinformatics (2008) 24: 110-117
Advances in Machine Learning II (2010) 263: 371-385
Big Data Analysis: New Algorithms for a New Society (2015) 16: 285-304
8. MCFS - short description
FULL DATA
FEATURE SUBSET
x S
TRAINTEST
x T
9. MCFS - short description
FULL DATA
FEATURE SUBSET
x S
TRAINTEST
x T
D. TREE
(BLACK BOX)
SCORE
10. MCFS - short description
FULL DATA
FEATURE SUBSET
x S
TRAINTEST
x T
D. TREE
(BLACK BOX)
D. TREE
(STRUCTURE)
SCORE
11. MCFS - short description
FULL DATA
FEATURE SUBSET
x S
TRAINTEST
x T
D. TREE
(BLACK BOX)
Relative Importance
D. TREE
(STRUCTURE)
SCORE
12. MCFS - short description
FULL DATA
FEATURE SUBSET
x S
TRAINTEST
x T
D. TREE
(BLACK BOX)
Relative Importance
D. TREE
(STRUCTURE)
SCORE
Inter-Dependency
13. MCFS - fields for improvement
distributing computations
allowing a wider range of
models and decision vectors
introducing universal and robust
feature importance metrics
14. Broadside - Architecture
•Can be run on an arbitrary number of
physical machines.
•Allows to dynamically attach and detach
nodes while running computations.
•Scales almost linearly when increasing the
amount of available processors.
•Platform-independent.
•Has no dependencies other than Java 1.8.
•Is open for extending by new types of
feature selectors.
15. Broadside – Feature Importance Metrics
TEST SET
PERMUTED TEST SET
MODEL
(BLACK BOX)
MODEL
(BLACK BOX)
SCORE
SCORE
DELTA
base: the standard RandomForests feature importance metric
16. Broadside – Feature Importance Metrics
TEST SET
PERMUTED TEST SET
MODEL
(BLACK BOX)
MODEL
(BLACK BOX)
SCORE
SCORE
DELTA
base: the standard RandomForests feature importance metric
enhancement: total effect decomposition to main effects and interaction effects
A B
C D
17. A B
C D main effects interaction effects
A B C D A-B A-C A-D B-C B-D C-D
totaleffects
A
B
C
D
AB
AC
AD
BC
BD
CD
Broadside – Feature Importance Metrics
18. A B
C D main effects interaction effects
A B C D A-B A-C A-D B-C B-D C-D
totaleffects
A x x x x
B
C
D
AB
AC
AD
BC
BD
CD
Broadside – Feature Importance Metrics
19. A B
C D main effects interaction effects
A B C D A-B A-C A-D B-C B-D C-D
totaleffects
A x x x x
B x x x x
C
D
AB
AC
AD
BC
BD
CD
Broadside – Feature Importance Metrics
20. A B
C D main effects interaction effects
A B C D A-B A-C A-D B-C B-D C-D
totaleffects
A x x x x
B x x x x
C x x x x
D
AB
AC
AD
BC
BD
CD
Broadside – Feature Importance Metrics
21. A B
C D main effects interaction effects
A B C D A-B A-C A-D B-C B-D C-D
totaleffects
A x x x x
B x x x x
C x x x x
D x x x x
AB
AC
AD
BC
BD
CD
Broadside – Feature Importance Metrics
22. A B
C D main effects interaction effects
A B C D A-B A-C A-D B-C B-D C-D
totaleffects
A x x x x
B x x x x
C x x x x
D x x x x
AB x x x x x x x
AC
AD
BC
BD
CD
Broadside – Feature Importance Metrics
23. A B
C D main effects interaction effects
A B C D A-B A-C A-D B-C B-D C-D
totaleffects
A x x x x
B x x x x
C x x x x
D x x x x
AB x x x x x x x
AC x x x x x x x
AD
BC
BD
CD
Broadside – Feature Importance Metrics
24. A B
C D main effects interaction effects
A B C D A-B A-C A-D B-C B-D C-D
totaleffects
A x x x x
B x x x x
C x x x x
D x x x x
AB x x x x x x x
AC x x x x x x x
AD x x x x x x x
BC x x x x x x x
BD x x x x x x x
CD x x x x x x x
Broadside – Feature Importance Metrics
25. Broadside – Flexibility
Different types of models can be plugged in to broadside by using
different model assessment metrics, ex.:
• categorical – Weighted Accuracy
• continuous – Mean Absolute Error
• survival – Concordance Index
Supported types of input variables depend on the choice of model.
Currently implemented models are:
• C4.5 classification trees
• RandomForests
• Extremely Randomized Trees
• Regression Trees
• Survival Trees (Ishvaran et al.)
26. Broadside – decision tree based ensemble methods
Different types of models can be plugged in to broadside by using
different model assessment metrics, ex.:
• categorical – Weighted Accuracy
• continuous – Mean Absolute Error
• survival – Concordance Index
Supported types of input variables depend on the choice of model.
Currently implemented models are:
• C4.5 classification trees
• RandomForests
• Extremely Randomized Trees
• Regression Trees
• Survival Trees (Ishvaran et al.)
39. Broaside - summary
• A new feature selection and interaction mining software.
• Follows some of original MCFS ideas (Draminski et al.).
• Distributed – tested on ~350 cores.
• Up to millions of features.
• Three types of decision vectors:
• categorical
• numeric
• survival time
• Two types of input features:
• categorical
• numeric
• Interactive feature importance graphs.