Passenger forecasting at KLM

Data Science Meetup
Passenger forecasting at KLM
From idea to meals on board
The science
Forecasting passengers
by
Alexander Backus

The data science product life cycle
PRODUCTIDEA
EXPERIMENT INDUSTRIALIZEIDEATE

Understanding the data-value chain
data PREDICT value
€
DECIDEinsight action MEASURE
passenger
forecasts
value proposition user
supply
meals
business objectives
optimal
catering

Predicting the number of passengers
that will board a flight
departureplanning
timeline
???
horizons
p p

System requirements
For specific upcoming flights
We want accurate passenger forecasts
At any moment before departure

boarded
passengers
feedback loop
System design
machine-learning algorithm
PREDICT
forecasted
passengers
flight and
booking data
𝑓𝑓 𝒙𝒙 = 𝒚𝒚

System output
Full conditional probability density?
Q10 Q90mean
𝔼𝔼 𝑌𝑌
forecasted passengers
probability
density
low
high
MVP

Current process is based on the number of expected passengers
regression
𝑓𝑓 𝑥𝑥 = 𝑦𝑦
supply chain process
DECIDEinsight action
user
PREDICT
passenger forecasts
Minimizing change management need
data

datetimelocationaircraft
System inputs
Last-minute bookings
No-shows
Aircraft changes
bookings
X X X
varied data
sources
equals forecasted passengers?Booked passengers

hours to departure
a.k.a.
query moment
hours to departure
bookings
0
max
*
24
*Mock figure for illustration purposes
Multi-timescale forecasting
Fit one model with temporal indicators

Defining the target
Facilitate learning: offset with booking number
𝑦𝑦′
= 𝒚𝒚 − 𝒙𝒙𝑏𝑏𝑏𝑏𝑏𝑏𝑏𝑏𝑏𝑏𝑏𝑏
Focus on learning interactions with booking numbers
booked
passengers
boarded
passengers
100 200 300

Performance metrics
business
user
model
€
customer satisfaction / cost reduction
undersupply / oversupply
mean absolute error

𝑀𝑀𝑀𝑀𝑀𝑀 =
1
𝑛𝑛
�
𝑖𝑖=1
𝑛𝑛
𝒚𝒚𝒊𝒊 − �𝒚𝒚𝒊𝒊
forecasted
passengers
200 220 240
boarded
passengers
€
Mean absolute error
Closely related to business goals

Metric slicing€
cabin class
business
economy
intercontinentaleurope
flight group
Differential impact on business process

Performance visualization
hours to departure
day of departurelong-term
MAE
departure
€
slices
residuals
0
density
undersupply
oversupply

Validation procedure
€
full historical
data set
2016
2017
2018
grouped random
split
validation set
shuffled
folds
test set
temporal
split
train set
rolling
window

flight_id
hours_to_
departure
weekday booked_pax destination capacity … boarded_pax
1122 72 mon 108 LHR 340 … 166
1123 46 sat 105 CDG 120 … 118
1124 202 tue 176 AMS NaN … 180
1125 4 mon 284 NYC 340 … 296
1126 25 thu 267 NaN 280 … 276
df.head()

Gradient boosting decision trees
Cuts through mixed-type high-dimensional tabular data with few assumptions

Decision trees
Objective: predict boarded passengers (minimize MAE loss)
example training
samples
holiday == True
T F
10 6
leaf
prediction-2
booked > 120
T F split

booked > 120
holiday == True
T F
T F
10 6
-2
Tree ensembles
Averaging multiple instances
destination == NYC
T F
-11
�𝒚𝒚 = 𝑓𝑓 𝒙𝒙 = 6 − 1 = 5

booked > 120
holiday == True
T F
T F
10 6
-2
-4 06 10𝒚𝒚
𝒙𝒙
training samples
tree 1
sequential fitting
-2 2
destination == NYC
T F
1
-2 20 0𝒚𝒚
𝒙𝒙
training samples
tree 2
Gradient boosting
Homing-in on mistakes

from sklearn.pipeline import Pipeline
from lightgbm.sklearn import LGBMRegressor
estimator = Pipeline(steps=[
('preprocessor', some_fancy_preprocessing_pipeline),
('regressor', LGBMRegressor(n_estimators=1000,
objective='regression_l1',
categorical_feature='auto’,
use_missing=True))
])
estimator.fit(X_train, y_train, **fit_params)

Tuning the algorithm
What happens if we keep boosting?
Regularization with learning rate:
�𝑦𝑦𝑡𝑡
= �𝑦𝑦𝑡𝑡−1
+ 𝜼𝜼 𝑓𝑓𝑡𝑡
𝑥𝑥
Early stopping based on validation set:
loss
iteration
validation loss
train loss
stop
Overfitting!
𝑥𝑥
𝑦𝑦

booked > 120
T F
holiday == True
T F
splitting
max_bin
bagging_fraction
feature_fraction pruning
num_leaves
max_depth
min_data_in_leaf
Key hyperparameters
Tuning the algorithm

Sequential model-based optimization: model expected improvement
Finding optimal hyperparameter settings
Balance between exploitation and exploration
Tree of Parzen
estimator
hyperopt
hyperparameter 1
hyperparameter 2
more sampling in high-score regions

from hyperopt import hp, Trials, tpe, fmin
space = {'max_depth': hp.quniform('max_depth', low=3, high=12, q=3),
'feature_fraction': hp.uniform('feature_fraction', low=0.3, high=1.0),
'learning_rate': hp.loguniform('learning_rate', low=-5, low=-3)}
def objective(params):
fit_params = dict(regressor__eval_set=[(X_val, y_val)],
regressor__early_stopping_rounds=5)
estimator.set_params(**params)
estimator.fit(X_train, y_train, **fit_params)
return estimator._final_estimator.best_score_
trials = Trials()
best_params = fmin(fn=objective, space=space, algo=tpe.suggest,
max_evals=10, trials=trials)

Experiment successful!
Time for a real test
Superior performance to the current system

Trimming the feature set
Pave the road to production
drop
gain
features

├── README.md
├── paxfor
│ ├── features.py
│ ├── pipeline.py
│ ├── model.py
│ ├── train.py
│ └── settings.py
│
├── requirements.txt
├── setup.py
├── tests
└── notebooks
From notebooks to software package

MOBS
real-time
data feed
Shadow deployment
Predicting on real-time production data
user
actionforecast
supply chain
process
current system

Challenge: Training-serving skew
differing values
OLD
variable 𝒙𝒙𝟏𝟏 in historical data
NEW
variable 𝒙𝒙𝟏𝟏 in
real-time production
environment

Solution: residual learning
Step 1. fit estimator:
𝒇𝒇𝟏𝟏 𝒙𝒙𝐀𝐀 = 𝒚𝒚
Step 2: fit residual estimator:
𝒇𝒇𝟐𝟐 𝒙𝒙𝐁𝐁 = 𝒚𝒚 − 𝒇𝒇𝟏𝟏 𝒙𝒙𝐁𝐁
Step 3: predict:
�𝒚𝒚 = 𝒇𝒇𝟏𝟏 𝒙𝒙𝐁𝐁 + 𝒇𝒇𝟐𝟐 𝒙𝒙𝐁𝐁
𝒇𝒇𝟐𝟐
subtract
residual target
extra
features
𝒇𝒇𝟏𝟏
target
𝒚𝒚
𝒙𝒙𝑨𝑨
historical
sources
𝒇𝒇𝟏𝟏 𝒙𝒙𝐁𝐁
𝒙𝒙𝐁𝐁
new
sources
�𝒚𝒚
add
forecast

Shadow deployment successful!
Proven superior performance to the current system
Benchmark beaten
hours to departure
day of departurelong-term
departure
meanabsoluteerror
current system
MOBS

Key take-aways
Understanding the data-value chain is key
to define the machine-learning problem
Get business stakeholders committed by
demonstrating value in a live test
Simplicity over complexity:
Think minimal viable to get to production

Passenger forecasting at KLM

Recommended

Recommended

More Related Content

What's hot

What's hot (15)

Similar to Passenger forecasting at KLM

Similar to Passenger forecasting at KLM (20)

Recently uploaded

Recently uploaded (20)

Passenger forecasting at KLM