Successfully reported this slideshow.

Random forests with random projections of the output space for high dimensional multi-label classification

0

Share

1 of 16
1 of 16

Random forests with random projections of the output space for high dimensional multi-label classification

0

Share

Download to read offline

We adapt the idea of random projections applied to the out- put space, so as to enhance tree-based ensemble methods in the context of multi-label classification. We show how learning time complexity can be reduced without affecting computational complexity and accuracy of predictions. We also show that random output space projections may be used in order to reach different bias-variance tradeoffs, over a broad panel of benchmark problems, and that this may lead to improved accuracy while reducing significantly the computational burden of the learning stage.

Link to the paper http://orbi.ulg.ac.be/handle/2268/172146
Souce code available at https://github.com/arjoly/random-output-trees

We adapt the idea of random projections applied to the out- put space, so as to enhance tree-based ensemble methods in the context of multi-label classification. We show how learning time complexity can be reduced without affecting computational complexity and accuracy of predictions. We also show that random output space projections may be used in order to reach different bias-variance tradeoffs, over a broad panel of benchmark problems, and that this may lead to improved accuracy while reducing significantly the computational burden of the learning stage.

Link to the paper http://orbi.ulg.ac.be/handle/2268/172146
Souce code available at https://github.com/arjoly/random-output-trees

More Related Content

Random forests with random projections of the output space for high dimensional multi-label classification

  1. 1. Random forests with random projections of the output space for high dimensional multi-label classification Arnaud Joly, Pierre Geurts, Louis Wehenkel
  2. 2. Multi-label classification tasks Many supervised learning applications in text, biology or image processing where samples are associated to sets of labels. Input X 800 600 pixel Output Y labels driver, mountain, road, car, tree, rock, line, human, . . . If each label corresponds to a wikipedia article, then we have around 4 million labels. 2 / 15
  3. 3. Random forest Randomized trees are built on a bootstrap copy of the input-output pairs ((xi; yi) 2 (X Y))ni =1 by recursively maximizing the reduction of impurity, here the variance Var. At each node, the best split is selected among k randomly selected features. S SL Xk tk Xk tk SR Var(S) = 0:24 Var(SL) = 0:014 Var(SR) = 0:1875 Var(S) = Var(S)

×