7. Tuning learning parameters
• Valida;on: 10x stra;fied shuffle split on learning (90%) and valida;on (10%)
• Parameters to tune
– tree depth
– learning rate
– number of trees in ensemble
– scheme of filling the missing values
– number of unimportant features to exclude
• Decision
– Marginal improvement in valida;on score (about 0.005 with big variance)
– Biased valida;on scheme (because of year-to-year changes)
– Final submission: XGBoost model with default learning parameters (Occam’s
Razor principle)
100 trees, max depth = 3, learning rate = 0.1
8. Feature evalua;on
Feature vector:
• Personal (15)
• Cards & Wealth (8)
• Ac;veness (8)
• Event counters (77)
• Geo (28)
Feature group AUC change
aCer removing feature group
AUC
only features from the group
Personal -0.0322 0.6615
Cards & Wealth -0.0137 0.5653
Event counters -0.0019 0.6738
Ac;veness -0.0012 0.6419
Geo loca;on -0.0004 0.6318
Cross-valida;on AUC score: 0.7213 (stra;fied shuffle train/test split)