Predicting the Contractual Full-Time Equivalent Perecentage using XGBoost, Stine Bakke and Knut Håkon Grini, Statistics Norway
1. Predicting the Contractual Full-Time
Equivalent Percentage using XGBoost
10/18/2019
Stine Bakke and Knut Håkon Grini
1
2. Agenda
• A-ordningen
• Agreed working hours – and the problem
• Preparing the data
• XGBoost (Extreme Gradient Boosted Decision Trees)
• Results
• Passing judgement/final thoughts
10/18/2019 2
3. A-ordningen – 24/7 reporting – monthly data
• Coordinated digital collection of information from employers about jobs,
earnings and taxes to 3 public agencies
10/18/2019 3
4. 10/18/2019 4
Contractual full-
time equivalent
Paid hours
(only hourly paid)
Hours per week
full time in position
Substandard Reporting 7,5 %
5. 10/18/2019 5
Check 1: Hourly paid
Ratio model, just
identification of
outliers
1,5 % extremes
Check 2: Boundaries on earnings
Lower wage
threshold is
established for FTE
wage and lower and
upper limits for
hourly paid
employees are set
2,7 % disq
Check 3: Relationship
between earnings and
FTE
Iterative linear
regression model
that checks for
outliers
4,7 % disq
6. eXtreme Gradient Boosting
• Uses «gradient boosted decision trees»
• Every tree provides a set of predicted values
• Trees are «grown» based on modified versions of the data
• Observations with bad prediction are weighted more
• Observations with good prediction are weighted less
• Improved prediction for each new tree
10/18/2019 6
7. Input variables
10/18/2019 7
• Fixed earnings (log)
• Reported or calculated FTE %
• Age and age squared
• Number of employees in local
unit (Ten groups)
• Education (first digit ISCED 2011)
• Industry (NACE 2007)
• Occupation (two digit ISCO 2008)
• Apprenticeship
• Earning category (fixed monthly,
hourly paid, other)
• Gender
9. Lessons learned, passing judgement and final thoughts
• Good learning data
• Careful specification – (variabelspec)
• Distribution of results – is it realistic or biased
• We are very optimistic but not quite finished
10/18/2019 9