This talk was part of a joint KLM-BigData Republic data science meetup to share results and learnings of a full-cycle data science project on passenger forecasting. I presented the data science part of the project, including how to frame the modeling problem, performance metrics, validation strategy, machine-learning algorithms and challenges from a data science perspective.
Displacement, Velocity, Acceleration, and Second Derivatives
Passenger forecasting at KLM
1. Data Science Meetup
Passenger forecasting at KLM
From idea to meals on board
The science
Forecasting passengers
by
Alexander Backus
2. The data science product life cycle
PRODUCTIDEA
EXPERIMENT INDUSTRIALIZEIDEATE
3. Understanding the data-value chain
data PREDICT value
€
DECIDEinsight action MEASURE
passenger
forecasts
value proposition user
supply
meals
business objectives
optimal
catering
4. Predicting the number of passengers
that will board a flight
departureplanning
timeline
???
horizons
p p
7. System output
Full conditional probability density?
Q10 Q90mean
𝔼𝔼 𝑌𝑌
forecasted passengers
probability
density
low
high
MVP
8. Current process is based on the number of expected passengers
regression
𝑓𝑓 𝑥𝑥 = 𝑦𝑦
supply chain process
DECIDEinsight action
user
PREDICT
passenger forecasts
Minimizing change management need
data
10. hours to departure
a.k.a.
query moment
hours to departure
bookings
0
max
*
24
*Mock figure for illustration purposes
Multi-timescale forecasting
Fit one model with temporal indicators
11. Defining the target
Facilitate learning: offset with booking number
𝑦𝑦′
= 𝒚𝒚 − 𝒙𝒙𝑏𝑏𝑏𝑏𝑏𝑏𝑏𝑏𝑏𝑏𝑏𝑏
Focus on learning interactions with booking numbers
booked
passengers
boarded
passengers
100 200 300
15. Performance visualization
hours to departure
day of departurelong-term
MAE
departure
€
*Mock figure for illustration purposes
slices
residuals
0
density
undersupply
oversupply
18. Gradient boosting decision trees
Cuts through mixed-type high-dimensional tabular data with few assumptions
19. Decision trees
Objective: predict boarded passengers (minimize MAE loss)
example training
samples
holiday == True
T F
10 6
leaf
prediction-2
booked > 120
T F split
20. booked > 120
holiday == True
T F
T F
10 6
-2
Tree ensembles
Averaging multiple instances
destination == NYC
T F
-11
�𝒚𝒚 = 𝑓𝑓 𝒙𝒙 = 6 − 1 = 5
21. booked > 120
holiday == True
T F
T F
10 6
-2
-4 06 10𝒚𝒚
𝒙𝒙
training samples
tree 1
sequential fitting
-2 2
destination == NYC
T F
1
-2 20 0𝒚𝒚
𝒙𝒙
training samples
tree 2
Gradient boosting
Homing-in on mistakes
23. Tuning the algorithm
What happens if we keep boosting?
Regularization with learning rate:
�𝑦𝑦𝑡𝑡
= �𝑦𝑦𝑡𝑡−1
+ 𝜼𝜼 𝑓𝑓𝑡𝑡
𝑥𝑥
Early stopping based on validation set:
loss
iteration
validation loss
train loss
stop
Overfitting!
𝑥𝑥
𝑦𝑦
24. booked > 120
T F
holiday == True
T F
splitting
max_bin
bagging_fraction
feature_fraction pruning
num_leaves
max_depth
min_data_in_leaf
Key hyperparameters
Tuning the algorithm
25. Sequential model-based optimization: model expected improvement
Finding optimal hyperparameter settings
Balance between exploitation and exploration
Tree of Parzen
estimator
hyperopt
hyperparameter 1
hyperparameter 2
more sampling in high-score regions
31. Challenge: Training-serving skew
*Mock figure for illustration purposes
differing values
OLD
variable 𝒙𝒙𝟏𝟏 in historical data
NEW
variable 𝒙𝒙𝟏𝟏 in
real-time production
environment
33. Shadow deployment successful!
Proven superior performance to the current system
*Mock figure for illustration purposes
Benchmark beaten
hours to departure
day of departurelong-term
departure
meanabsoluteerror
current system
MOBS
34. The data science product life cycle
PRODUCTIDEA
EXPERIMENT INDUSTRIALIZEIDEATE
35. Key take-aways
Understanding the data-value chain is key
to define the machine-learning problem
Get business stakeholders committed by
demonstrating value in a live test
Simplicity over complexity:
Think minimal viable to get to production