IAC 2024 - IA Fast Track to Search Focused AI Solutions
DMC2013 Task 2
1. Data Mining Cup 2013
Task 2
The Uni_Budapest_Te_1 solution
Gábor Simon
2. Data Preparation
Several numeric variables with missing values
Missing values were very informative
Solution:
• Discretize numeric variables into categories
• Missing values will be a separate category
3. Modeling
Machine learning library: Weka 3.7
Algorithm: Stochastic Gradient Descent
• Offline learning: pre-train on Task 1 data
• Online learning: keep learning during evaluation
5 pre-trained models packaged in our solution
4. Modeling
Transactions at different points in the session are
very different
To exploit this, use separate models for different
parts of sessions
6. Evaluation
Performance test on Task 1 data
Train on 70 % of sessions, evaluate on 30 %
0
10000
20000
30000
40000
50000
60000
70000
1 2 3 4, 5, 6 7+
Number of instances
8. Evaluation
Good predictive power for first few steps
After that, slightly better than benchmark
50%
55%
60%
65%
70%
75%
80%
1 2 3 4, 5, 6 7+
Benchmark Model
9. Summary
The key points of our solution:
• Recode numerical variables with missing values
into categories
• Use pretrained – but updateable – models
• Use different models for different parts of
sessions