Consolidating MLOps at One of Europe’s Biggest Airports

Consolidating MLOps at
Schiphol Airport
Floris Hoogenboom – Lead Data Scientist (Floris.Hoogenboom@schiphol.nl)
Sebastiaan Grasdijk - Senior Data Scientist (Sebastiaan.Grasdijk@Schiphol.nl)

Introduction
• Amsterdam Airport
• Before Covid: Europe’s third airport
• Approx 500.000 ATMs
• 72 Milion PAX in 2019
• Royal Schiphol Group
• Schiphol Digital

Introduction
• Block time predictions
• Gate Planning
• Passenger Flow
• Covid-19 Safety
• Environmental noise
• Smart Buildings
• Security Lane planning
• TAF forecasts
• Turnaround Insights
XX123
XX123
XX123
XX123 XX123
XX123
XX123
XX123
ACREG
ACREG
ACREG
ACREG
ACREG

Introduction
• Gate Planning
• Passenger Flow
• Covid-19 Safety
• Smart Buildings
• TAF forecasts
000
ACREG
9999
000

Introduction
• Gate Planning
• Passenger Flow
• Covid-19 Safety
• Smart Buildings
• TAF forecasts

Introduction
• Gate Planning
• Passenger Flow
• Covid-19 Safety
• Smart Buildings
• TAF forecasts
XX123 XX123
ACREG

“Show how we implemented MLOps and how that enables us
to keep applying ML in a constantly changing environment.”
• Motivation
• Our MLFlow training set-up
• Bringing a trained model to production
• Monitoring

• Schiphol is a very dynamic place to apply AI in
Everyday some physical aspect of the Airport changes meaning that dynamics of e.g. PAX flow will be different.
• Most things we capture in our models, but some things we are not able to.
Sometimes we don’t know works will occur, sometimes long term incidents happen we hadn't foreseen and we
quickly want to adapt our models to.
• Keeping track and monitoring our models in production was always a big task already
• We often released updates to our models, e.g. including new data sources, deprecating temporarily
unavailable feeds etc. to make sure we always had the best quality.

• Quite standard
• Have a very strict format for all of our models:
• Python package containing (1) library code (2) training
application and (3) inference application
• Training just entails installing the package and
referencing a fixed entrypoint that is everywhere the same

• Machine learning deals with data
• There is only one type of data that matters for modelling: PRD
data
• Lots of organizations use the engineering DTAP flow where
scientists work on "DEV" to train their models
• This works if it's their DEV and not also some engineer's DEV

• Three types of models we deploy:
• Batch (e.g. Block Time Prediction) -> Databricks Job
• Streaming (e.g. Bagage time on Belt) -> Databricks Job
• Request/Reply (e.g. The forecasted disturbance at a given location) -> API in kubernetes
• Our way of integrating models in each of those deployments is more or less the same
• Focus on Batch for the rest of the talk

• Cross environment dependencies
• Runtime dependencies (mlflow.load_model only executed when
running the job)
• Stability assumptions on your inference & model codebase
• "non-atomic" deployments: it is hard to keep track of exactly what is
running where
Dive into these points before showing how we resolved this.

• There is a discrepancy between a "model" in the deployment sense and
a "model" in the data science sense
• Models come with an interface that specifies:
• The features that should go in
• (implicitly) the data distribution of those features
• Deploying a model means deploying:
• The trained artifact
• Any code that is needed to do preprocessing/fetch queries to fetch data from a
datasource etc.
• These cannot be decoupled! (!!)
Is this always a big problem? No
Some models have a very stable API (e.g. computer
vision models).

• Not every model can be deployed with every version of your inference code
• You need to ensure that they are "feature compatible"
• This makes the Model registry UI a bit dangerous
New release that
dropped a few
features
Old release that still
used those features
What if we want to revert?

• There are two version identifiers that determine the actual prediction job that will run
• This is hard to reason about, debug, log and manage
• Having a single source of truth makes it possible to know what is running where and how to revert

• Data Scientist adapts the codebase to train a
new model.
• Stores changes in Git
• Uses mlflow run to kick of a new mlflow run
on databricks that logs the new run to some
experiment.
• Data Scientist judges the quality of the
experiment and decides whether this is good
enough for review

• Data Scientist creates an MR on the repo to
merge:
• The code for training the new model
• The adapted inference code such that it matches
with the model
• The configuration files for the deployed model (!)
• Unittests, linting etc. Runs
• Then the interesting part starts.....

• CI Fetches the model from the MLFlow
experiment based on the specified Run ID
• CI "builds" the deployment artifact which
contains
• The model we wish to deploy
• The inference code you need to run it
• This creates a single artifact that can be
deployed without any runtime dependencies!

• Deploy the created deployment artifact
• As a databricks job
• As a docker container
• Etc..
• Environment just based on Git Tags
• Keep track of your environments like you
would do traditionally

• We do still use the model registry!
• The model registry is managed from the CI pipeline
• We use the following stages:
• On feature branch deployments: register a new version in the registry if it does not exist yet
• On master: promote model to staging
• On tags: promote model to production

• We use airflow for scheduling automated retraining
• We don't automatically "update" models in production based on retraining
• Rather, we take away the manual process of starting a run etc., but the decision to go live is always up to a
data scientist.

• Metrics get logged to Datadog
• Anomaly monitoring and warnings are sent to a slack channel
• We use notebooks to dive into any anomalies we see

• Data Scientist can deploy models without any support
• We release new versions of many of our models every week
• Not only by training on new data, but also by adding features, changing data fetching etc.
• Fully versioned with a single source of truth
• If it works on DEV, it will work on ACP and PRD because of the single deployment package
• Easy to revert if something breaks

• MLFlow is a great tool, but it is not a click & go solution always
• Feature compatibility is an important issue to keep in mind, your model is much more than just your
algorithm
• Having a single source of truth, makes managing models much more like managing traditional software
• Having a proper MLOps flow enables speed in getting ML to production

Feedback
Your feedback is important to us.
Don’t forget to rate and review the sessions.

Consolidating MLOps at One of Europe’s Biggest Airports

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to Consolidating MLOps at One of Europe’s Biggest Airports

Similar to Consolidating MLOps at One of Europe’s Biggest Airports (20)

More from Databricks

More from Databricks (20)

Recently uploaded

Recently uploaded (20)

Consolidating MLOps at One of Europe’s Biggest Airports