Upcoming SlideShare
×

# 6 Tips for Optimizing TreeNet Gradient Boosting Models

1,696

Published on

1 Like
Statistics
Notes
• Full Name
Comment goes here.

Are you sure you want to Yes No
• Be the first to comment

Views
Total Views
1,696
On Slideshare
0
From Embeds
0
Number of Embeds
3
Actions
Shares
0
21
0
Likes
1
Embeds 0
No embeds

No notes for slide

### 6 Tips for Optimizing TreeNet Gradient Boosting Models

1. 1. Dan SteinbergJanuary 2013Salford Systemswww.salford-systems.com
2. 2.  While TreeNet (Stochastic Gradient Boosting) can work phenomenally well out of the box it almost always pays to try to tune your control parameters. Devoting time to optimizing a TreeNet model can improve its out of sample performance noticeably. Here is a list of several things recommended for all TreeNet users. © Copyright Salford Systems 2013
3. 3.  TreeNet starts with 200 trees by default, although you can reset default. In real-world modeling we often find that 1,000 or more trees perform better. © Copyright Salford Systems 2013
4. 4.  This one goes hand in hand with growing enough trees because the slower your learn rate is, the more trees you will need. There is nothing wrong with using a learn rate of .001 if you are willing to let your machine run through all the trees you will need. © Copyright Salford Systems 2013
5. 5.  The default value of 0.10 means that 10% of the data could be ignored in each training cycle. You ought to experiment with a value of 0.0 to see if it helps or hurts. You can also try values such as 0.02, 0.05 etc. Note: If the data are very clean 0.0 should work best. © Copyright Salford Systems 2013
6. 6.  If 500 trees are needed when you generate 6 node trees, you might need 1500 or more when generating just 2-node trees. Sometimes moderately large trees work best: 12-node, 15-node, even 25-node trees could do the trick. Since large trees learn more than smaller trees, you might also need to dial down the learn rate to prevent over-fitting. © Copyright Salford Systems 2013
7. 7.  Try Battery LOVO (leave one variable out) as this might allow you to remove a variable from the middle of the pack in terms of importance. Try Battery SHAVING to remove the least important variables (shaving from the bottom of the list). This tests the viability of dropping the "best" variables © Copyright Salford Systems 2013
8. 8.  First, run some completely additive models. Unlike 2-node trees that can actually allow interactions due to the manner in which TreeNet handles missing values. With the ICL ADDITIVE command you guarantee no possible interactions of any kind, including interactions between missing value indicators created by TreeNet and other variables. © Copyright Salford Systems 2013