Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.
L1-based compression of random forest model 
Arnaud Joly, François Schnitlzer, Pierre Geurts, Louis Wehenkel 
a.joly@ulg.a...
High dimensional supervised learning applications 
3D Image 
segmentation 
Genomics Electrical grid 
x=original image 
y=s...
Tree based ensemble methods 
From a dataset of input-output pairs f(xi ; yi )gni 
=1  X  Y, we 
approximate f : X ! Y by l...
High model complexity ! Large memory requirement 
The complexity of tree based methods is measured by the number of 
inter...
L1-based compression of random forest model (I) 
We first learn an ensemble of M extremely randomized trees (Geurts, et al...
L1-based compression of random forest model (II) 
The node indicator functions 1m;l (x) may be used to lift the input spac...
L1-based compression of random forest model (III) 
A variable selection method (regularization with L1-norm) is applied on...
L1-based compression of random forest modelSlide
L1-based compression of random forest modelSlide
L1-based compression of random forest modelSlide
L1-based compression of random forest modelSlide
L1-based compression of random forest modelSlide
L1-based compression of random forest modelSlide
L1-based compression of random forest modelSlide
L1-based compression of random forest modelSlide
L1-based compression of random forest modelSlide
Upcoming SlideShare
Loading in …5
×

L1-based compression of random forest modelSlide

759 views

Published on

Random forests are effective supervised learning methods applicable to large-scale datasets. However, the space complexity of tree ensembles, in terms of their total number of nodes, is often prohibitive, specially in the context of problems with very high-dimensional input spaces. We propose to study their compressibility by applying a L1-based regularization to the set of indicator functions defined by all their nodes. We show experimentally that preserving or even improving the model accuracy while significantly reducing its space complexity is indeed possible.

Link to the paper http://orbi.ulg.ac.be/handle/2268/124834

Published in: Data & Analytics
  • Be the first to comment

  • Be the first to like this

L1-based compression of random forest modelSlide

  1. 1. L1-based compression of random forest model Arnaud Joly, François Schnitlzer, Pierre Geurts, Louis Wehenkel a.joly@ulg.ac.be - www.ajoly.org Departement of EE and CS & GIGA-Research
  2. 2. High dimensional supervised learning applications 3D Image segmentation Genomics Electrical grid x=original image y=segmented image x=DNA sequence y=phenotype x=system state y=stability From 105 to 109 dimensions.
  3. 3. Tree based ensemble methods From a dataset of input-output pairs f(xi ; yi )gni =1 X Y, we approximate f : X ! Y by learning an ensemble of M decision trees. Label 3 Yes Yes No No ... Label 1 Label 2 The estimator ^f of f is obtained by averaging the predictions of the ensemble of trees.
  4. 4. High model complexity ! Large memory requirement The complexity of tree based methods is measured by the number of internal nodes and increases with I the ensemble size M; I the number of samples n in the dataset. The variance of individual trees increases with the dimension p of the original feature space ! M(p) should increase with p to yield near optimal accuracy. Complexity grows as nM(p) ! may require huge amount of storage. Memory limitation will be an issue in high dimensional problems.
  5. 5. L1-based compression of random forest model (I) We first learn an ensemble of M extremely randomized trees (Geurts, et al., 2006) . . . Example 1,1 1,2 1,3 1,4 1,5 1,6 1,7 2,1 2,2 2,3 2,4 2,5 2,6 2,7 3,1 3,2 3,3 3,4 3,5 . . . and associate to each node an indicator function 1m;l (x) which is equal to 1 if the sample (x; y) reaches the l -th node of the m-th tree, 0 otherwise.
  6. 6. L1-based compression of random forest model (II) The node indicator functions 1m;l (x) may be used to lift the input space X towards its induced feature space Z z(x) = (11;1(x); : : : ; 11;N1(x); : : : ; 1M;1(x); : : : ; 1M;NM(x)) : Example for one sample xs 1,1 1,2 1,3 1,4 1,5 1,6 1,7 2,1 2,2 2,3 2,4 2,5 2,6 2,7 3,1 3,2 3,3 3,4 3,5
  7. 7. L1-based compression of random forest model (III) A variable selection method (regularization with L1-norm) is applied on the induced space Z to compress the tree ensemble using the solution of

×