Successfully reported this slideshow.

L1-based compression of random forest modelSlide

2

Share

Upcoming SlideShare
Image seg using_thresholding
Image seg using_thresholding
Loading in …3
×
1 of 16
1 of 16

L1-based compression of random forest modelSlide

2

Share

Download to read offline

Random forests are effective supervised learning methods applicable to large-scale datasets. However, the space complexity of tree ensembles, in terms of their total number of nodes, is often prohibitive, specially in the context of problems with very high-dimensional input spaces. We propose to study their compressibility by applying a L1-based regularization to the set of indicator functions defined by all their nodes. We show experimentally that preserving or even improving the model accuracy while significantly reducing its space complexity is indeed possible.

Link to the paper http://orbi.ulg.ac.be/handle/2268/124834

Random forests are effective supervised learning methods applicable to large-scale datasets. However, the space complexity of tree ensembles, in terms of their total number of nodes, is often prohibitive, specially in the context of problems with very high-dimensional input spaces. We propose to study their compressibility by applying a L1-based regularization to the set of indicator functions defined by all their nodes. We show experimentally that preserving or even improving the model accuracy while significantly reducing its space complexity is indeed possible.

Link to the paper http://orbi.ulg.ac.be/handle/2268/124834

More Related Content

L1-based compression of random forest modelSlide

  1. 1. L1-based compression of random forest model Arnaud Joly, François Schnitlzer, Pierre Geurts, Louis Wehenkel a.joly@ulg.ac.be - www.ajoly.org Departement of EE and CS & GIGA-Research
  2. 2. High dimensional supervised learning applications 3D Image segmentation Genomics Electrical grid x=original image y=segmented image x=DNA sequence y=phenotype x=system state y=stability From 105 to 109 dimensions.
  3. 3. Tree based ensemble methods From a dataset of input-output pairs f(xi ; yi )gni =1 X Y, we approximate f : X ! Y by learning an ensemble of M decision trees. Label 3 Yes Yes No No ... Label 1 Label 2 The estimator ^f of f is obtained by averaging the predictions of the ensemble of trees.
  4. 4. High model complexity ! Large memory requirement The complexity of tree based methods is measured by the number of internal nodes and increases with I the ensemble size M; I the number of samples n in the dataset. The variance of individual trees increases with the dimension p of the original feature space ! M(p) should increase with p to yield near optimal accuracy. Complexity grows as nM(p) ! may require huge amount of storage. Memory limitation will be an issue in high dimensional problems.
  5. 5. L1-based compression of random forest model (I) We first learn an ensemble of M extremely randomized trees (Geurts, et al., 2006) . . . Example 1,1 1,2 1,3 1,4 1,5 1,6 1,7 2,1 2,2 2,3 2,4 2,5 2,6 2,7 3,1 3,2 3,3 3,4 3,5 . . . and associate to each node an indicator function 1m;l (x) which is equal to 1 if the sample (x; y) reaches the l -th node of the m-th tree, 0 otherwise.
  6. 6. L1-based compression of random forest model (II) The node indicator functions 1m;l (x) may be used to lift the input space X towards its induced feature space Z z(x) = (11;1(x); : : : ; 11;N1(x); : : : ; 1M;1(x); : : : ; 1M;NM(x)) : Example for one sample xs 1,1 1,2 1,3 1,4 1,5 1,6 1,7 2,1 2,2 2,3 2,4 2,5 2,6 2,7 3,1 3,2 3,3 3,4 3,5
  7. 7. L1-based compression of random forest model (III) A variable selection method (regularization with L1-norm) is applied on the induced space Z to compress the tree ensemble using the solution of

×