SlideShare a Scribd company logo
Consolidating MLOps at
Schiphol Airport
Floris Hoogenboom – Lead Data Scientist (Floris.Hoogenboom@schiphol.nl)
Sebastiaan Grasdijk - Senior Data Scientist (Sebastiaan.Grasdijk@Schiphol.nl)
Introduction
• Amsterdam Airport
• Before Covid: Europe’s third airport
• Approx 500.000 ATMs
• 72 Milion PAX in 2019
• Royal Schiphol Group
• Schiphol Digital
Introduction
• Block time predictions
• Gate Planning
• Passenger Flow
• Covid-19 Safety
• Environmental noise
• Smart Buildings
• Security Lane planning
• TAF forecasts
• Turnaround Insights
XX123
XX123
XX123
XX123 XX123
XX123
XX123
XX123
ACREG
ACREG
ACREG
ACREG
ACREG
Introduction
• Block time predictions
• Gate Planning
• Passenger Flow
• Covid-19 Safety
• Environmental noise
• Smart Buildings
• Security Lane planning
• TAF forecasts
• Turnaround Insights
000
ACREG
9999
000
Introduction
• Block time predictions
• Gate Planning
• Passenger Flow
• Covid-19 Safety
• Environmental noise
• Smart Buildings
• Security Lane planning
• TAF forecasts
• Turnaround Insights
Introduction
• Block time predictions
• Gate Planning
• Passenger Flow
• Covid-19 Safety
• Environmental noise
• Smart Buildings
• Security Lane planning
• TAF forecasts
• Turnaround Insights
XX123 XX123
ACREG
“Show how we implemented MLOps and how that enables us
to keep applying ML in a constantly changing environment.”
• Motivation
• Our MLFlow training set-up
• Bringing a trained model to production
• Monitoring
• Schiphol is a very dynamic place to apply AI in
Everyday some physical aspect of the Airport changes meaning that dynamics of e.g. PAX flow will be different.
• Most things we capture in our models, but some things we are not able to.
Sometimes we don’t know works will occur, sometimes long term incidents happen we hadn't foreseen and we
quickly want to adapt our models to.
• Keeping track and monitoring our models in production was always a big task already
• We often released updates to our models, e.g. including new data sources, deprecating temporarily
unavailable feeds etc. to make sure we always had the best quality.
• Quite standard
• Have a very strict format for all of our models:
• Python package containing (1) library code (2) training
application and (3) inference application
• Training just entails installing the package and
referencing a fixed entrypoint that is everywhere the same
• Machine learning deals with data
• There is only one type of data that matters for modelling: PRD
data
• Lots of organizations use the engineering DTAP flow where
scientists work on "DEV" to train their models
• This works if it's their DEV and not also some engineer's DEV
• Three types of models we deploy:
• Batch (e.g. Block Time Prediction) -> Databricks Job
• Streaming (e.g. Bagage time on Belt) -> Databricks Job
• Request/Reply (e.g. The forecasted disturbance at a given location) -> API in kubernetes
• Our way of integrating models in each of those deployments is more or less the same
• Focus on Batch for the rest of the talk
• Cross environment dependencies
• Runtime dependencies (mlflow.load_model only executed when
running the job)
• Stability assumptions on your inference & model codebase
• "non-atomic" deployments: it is hard to keep track of exactly what is
running where
Dive into these points before showing how we resolved this.
• There is a discrepancy between a "model" in the deployment sense and
a "model" in the data science sense
• Models come with an interface that specifies:
• The features that should go in
• (implicitly) the data distribution of those features
• Deploying a model means deploying:
• The trained artifact
• Any code that is needed to do preprocessing/fetch queries to fetch data from a
datasource etc.
• These cannot be decoupled! (!!)
Is this always a big problem? No
Some models have a very stable API (e.g. computer
vision models).
• Not every model can be deployed with every version of your inference code
• You need to ensure that they are "feature compatible"
• This makes the Model registry UI a bit dangerous
New release that
dropped a few
features
Old release that still
used those features
What if we want to revert?
• There are two version identifiers that determine the actual prediction job that will run
• This is hard to reason about, debug, log and manage
• Having a single source of truth makes it possible to know what is running where and how to revert
• Data Scientist adapts the codebase to train a
new model.
• Stores changes in Git
• Uses mlflow run to kick of a new mlflow run
on databricks that logs the new run to some
experiment.
• Data Scientist judges the quality of the
experiment and decides whether this is good
enough for review
• Data Scientist creates an MR on the repo to
merge:
• The code for training the new model
• The adapted inference code such that it matches
with the model
• The configuration files for the deployed model (!)
• Unittests, linting etc. Runs
• Then the interesting part starts.....
• CI Fetches the model from the MLFlow
experiment based on the specified Run ID
• CI "builds" the deployment artifact which
contains
• The model we wish to deploy
• The inference code you need to run it
• This creates a single artifact that can be
deployed without any runtime dependencies!
• Deploy the created deployment artifact
• As a databricks job
• As a docker container
• Etc..
• Environment just based on Git Tags
• Keep track of your environments like you
would do traditionally
• We do still use the model registry!
• The model registry is managed from the CI pipeline
• We use the following stages:
• On feature branch deployments: register a new version in the registry if it does not exist yet
• On master: promote model to staging
• On tags: promote model to production
• We use airflow for scheduling automated retraining
• We don't automatically "update" models in production based on retraining
• Rather, we take away the manual process of starting a run etc., but the decision to go live is always up to a
data scientist.
• We use airflow for scheduling automated retraining
• We don't automatically "update" models in production based on retraining
• Rather, we take away the manual process of starting a run etc., but the decision to go live is always up to a
data scientist.
• Metrics get logged to Datadog
• Anomaly monitoring and warnings are sent to a slack channel
• We use notebooks to dive into any anomalies we see
• Data Scientist can deploy models without any support
• We release new versions of many of our models every week
• Not only by training on new data, but also by adding features, changing data fetching etc.
• Fully versioned with a single source of truth
• If it works on DEV, it will work on ACP and PRD because of the single deployment package
• Easy to revert if something breaks
• MLFlow is a great tool, but it is not a click & go solution always
• Feature compatibility is an important issue to keep in mind, your model is much more than just your
algorithm
• Having a single source of truth, makes managing models much more like managing traditional software
• Having a proper MLOps flow enables speed in getting ML to production
Questions?
Feedback
Your feedback is important to us.
Don’t forget to rate and review the sessions.

More Related Content

What's hot

Introduction SQL Analytics on Lakehouse Architecture
Introduction SQL Analytics on Lakehouse ArchitectureIntroduction SQL Analytics on Lakehouse Architecture
Introduction SQL Analytics on Lakehouse Architecture
Databricks
 
A Reference Architecture for ETL 2.0
A Reference Architecture for ETL 2.0 A Reference Architecture for ETL 2.0
A Reference Architecture for ETL 2.0
DataWorks Summit
 
DW Migration Webinar-March 2022.pptx
DW Migration Webinar-March 2022.pptxDW Migration Webinar-March 2022.pptx
DW Migration Webinar-March 2022.pptx
Databricks
 
Architect’s Open-Source Guide for a Data Mesh Architecture
Architect’s Open-Source Guide for a Data Mesh ArchitectureArchitect’s Open-Source Guide for a Data Mesh Architecture
Architect’s Open-Source Guide for a Data Mesh Architecture
Databricks
 
[DSC Europe 22] Overview of the Databricks Platform - Petar Zecevic
[DSC Europe 22] Overview of the Databricks Platform - Petar Zecevic[DSC Europe 22] Overview of the Databricks Platform - Petar Zecevic
[DSC Europe 22] Overview of the Databricks Platform - Petar Zecevic
DataScienceConferenc1
 
Serverless Kafka and Spark in a Multi-Cloud Lakehouse Architecture
Serverless Kafka and Spark in a Multi-Cloud Lakehouse ArchitectureServerless Kafka and Spark in a Multi-Cloud Lakehouse Architecture
Serverless Kafka and Spark in a Multi-Cloud Lakehouse Architecture
Kai Wähner
 
A Thorough Comparison of Delta Lake, Iceberg and Hudi
A Thorough Comparison of Delta Lake, Iceberg and HudiA Thorough Comparison of Delta Lake, Iceberg and Hudi
A Thorough Comparison of Delta Lake, Iceberg and Hudi
Databricks
 
Data Lakehouse Symposium | Day 4
Data Lakehouse Symposium | Day 4Data Lakehouse Symposium | Day 4
Data Lakehouse Symposium | Day 4
Databricks
 
Data Vault Vs Data Lake
Data Vault Vs Data LakeData Vault Vs Data Lake
Data Vault Vs Data Lake
Calum Miller
 
Databricks Overview for MLOps
Databricks Overview for MLOpsDatabricks Overview for MLOps
Databricks Overview for MLOps
Databricks
 
Azure Data Factory v2
Azure Data Factory v2Azure Data Factory v2
Azure Data Factory v2
inovex GmbH
 
Big Data Architectural Patterns and Best Practices
Big Data Architectural Patterns and Best PracticesBig Data Architectural Patterns and Best Practices
Big Data Architectural Patterns and Best Practices
Amazon Web Services
 
Architecting a datalake
Architecting a datalakeArchitecting a datalake
Architecting a datalake
Laurent Leturgez
 
Robust MLOps with Open-Source: ModelDB, Docker, Jenkins, and Prometheus
Robust MLOps with Open-Source: ModelDB, Docker, Jenkins, and PrometheusRobust MLOps with Open-Source: ModelDB, Docker, Jenkins, and Prometheus
Robust MLOps with Open-Source: ModelDB, Docker, Jenkins, and Prometheus
Manasi Vartak
 
Introduction to Data Vault Modeling
Introduction to Data Vault ModelingIntroduction to Data Vault Modeling
Introduction to Data Vault Modeling
Kent Graziano
 
Managing Millions of Tests Using Databricks
Managing Millions of Tests Using DatabricksManaging Millions of Tests Using Databricks
Managing Millions of Tests Using Databricks
Databricks
 
Databricks Delta Lake and Its Benefits
Databricks Delta Lake and Its BenefitsDatabricks Delta Lake and Its Benefits
Databricks Delta Lake and Its Benefits
Databricks
 
Modernizing to a Cloud Data Architecture
Modernizing to a Cloud Data ArchitectureModernizing to a Cloud Data Architecture
Modernizing to a Cloud Data Architecture
Databricks
 
The delta architecture
The delta architectureThe delta architecture
The delta architecture
Prakash Chockalingam
 
Lessons Learned: Understanding Pipeline Pricing in Azure Data Factory and Azu...
Lessons Learned: Understanding Pipeline Pricing in Azure Data Factory and Azu...Lessons Learned: Understanding Pipeline Pricing in Azure Data Factory and Azu...
Lessons Learned: Understanding Pipeline Pricing in Azure Data Factory and Azu...
Cathrine Wilhelmsen
 

What's hot (20)

Introduction SQL Analytics on Lakehouse Architecture
Introduction SQL Analytics on Lakehouse ArchitectureIntroduction SQL Analytics on Lakehouse Architecture
Introduction SQL Analytics on Lakehouse Architecture
 
A Reference Architecture for ETL 2.0
A Reference Architecture for ETL 2.0 A Reference Architecture for ETL 2.0
A Reference Architecture for ETL 2.0
 
DW Migration Webinar-March 2022.pptx
DW Migration Webinar-March 2022.pptxDW Migration Webinar-March 2022.pptx
DW Migration Webinar-March 2022.pptx
 
Architect’s Open-Source Guide for a Data Mesh Architecture
Architect’s Open-Source Guide for a Data Mesh ArchitectureArchitect’s Open-Source Guide for a Data Mesh Architecture
Architect’s Open-Source Guide for a Data Mesh Architecture
 
[DSC Europe 22] Overview of the Databricks Platform - Petar Zecevic
[DSC Europe 22] Overview of the Databricks Platform - Petar Zecevic[DSC Europe 22] Overview of the Databricks Platform - Petar Zecevic
[DSC Europe 22] Overview of the Databricks Platform - Petar Zecevic
 
Serverless Kafka and Spark in a Multi-Cloud Lakehouse Architecture
Serverless Kafka and Spark in a Multi-Cloud Lakehouse ArchitectureServerless Kafka and Spark in a Multi-Cloud Lakehouse Architecture
Serverless Kafka and Spark in a Multi-Cloud Lakehouse Architecture
 
A Thorough Comparison of Delta Lake, Iceberg and Hudi
A Thorough Comparison of Delta Lake, Iceberg and HudiA Thorough Comparison of Delta Lake, Iceberg and Hudi
A Thorough Comparison of Delta Lake, Iceberg and Hudi
 
Data Lakehouse Symposium | Day 4
Data Lakehouse Symposium | Day 4Data Lakehouse Symposium | Day 4
Data Lakehouse Symposium | Day 4
 
Data Vault Vs Data Lake
Data Vault Vs Data LakeData Vault Vs Data Lake
Data Vault Vs Data Lake
 
Databricks Overview for MLOps
Databricks Overview for MLOpsDatabricks Overview for MLOps
Databricks Overview for MLOps
 
Azure Data Factory v2
Azure Data Factory v2Azure Data Factory v2
Azure Data Factory v2
 
Big Data Architectural Patterns and Best Practices
Big Data Architectural Patterns and Best PracticesBig Data Architectural Patterns and Best Practices
Big Data Architectural Patterns and Best Practices
 
Architecting a datalake
Architecting a datalakeArchitecting a datalake
Architecting a datalake
 
Robust MLOps with Open-Source: ModelDB, Docker, Jenkins, and Prometheus
Robust MLOps with Open-Source: ModelDB, Docker, Jenkins, and PrometheusRobust MLOps with Open-Source: ModelDB, Docker, Jenkins, and Prometheus
Robust MLOps with Open-Source: ModelDB, Docker, Jenkins, and Prometheus
 
Introduction to Data Vault Modeling
Introduction to Data Vault ModelingIntroduction to Data Vault Modeling
Introduction to Data Vault Modeling
 
Managing Millions of Tests Using Databricks
Managing Millions of Tests Using DatabricksManaging Millions of Tests Using Databricks
Managing Millions of Tests Using Databricks
 
Databricks Delta Lake and Its Benefits
Databricks Delta Lake and Its BenefitsDatabricks Delta Lake and Its Benefits
Databricks Delta Lake and Its Benefits
 
Modernizing to a Cloud Data Architecture
Modernizing to a Cloud Data ArchitectureModernizing to a Cloud Data Architecture
Modernizing to a Cloud Data Architecture
 
The delta architecture
The delta architectureThe delta architecture
The delta architecture
 
Lessons Learned: Understanding Pipeline Pricing in Azure Data Factory and Azu...
Lessons Learned: Understanding Pipeline Pricing in Azure Data Factory and Azu...Lessons Learned: Understanding Pipeline Pricing in Azure Data Factory and Azu...
Lessons Learned: Understanding Pipeline Pricing in Azure Data Factory and Azu...
 

Similar to Consolidating MLOps at One of Europe’s Biggest Airports

DevOps for Machine Learning overview en-us
DevOps for Machine Learning overview en-usDevOps for Machine Learning overview en-us
DevOps for Machine Learning overview en-us
eltonrodriguez11
 
Making Data Science Scalable - 5 Lessons Learned
Making Data Science Scalable - 5 Lessons LearnedMaking Data Science Scalable - 5 Lessons Learned
Making Data Science Scalable - 5 Lessons Learned
Laurenz Wuttke
 
Machine Learning Models: From Research to Production 6.13.18
Machine Learning Models: From Research to Production 6.13.18Machine Learning Models: From Research to Production 6.13.18
Machine Learning Models: From Research to Production 6.13.18
Cloudera, Inc.
 
[DSC Europe 23] Petar Zecevic - ML in Production on Databricks
[DSC Europe 23] Petar Zecevic - ML in Production on Databricks[DSC Europe 23] Petar Zecevic - ML in Production on Databricks
[DSC Europe 23] Petar Zecevic - ML in Production on Databricks
DataScienceConferenc1
 
How to use continual learning in your ML models
How to use continual learning in your ML modelsHow to use continual learning in your ML models
How to use continual learning in your ML models
cnvrg.io AI OS - Hands-on ML Workshops
 
Cnvrg webinar continual learning
Cnvrg webinar   continual learningCnvrg webinar   continual learning
Cnvrg webinar continual learning
Maya Perry
 
Building a MLOps Platform Around MLflow to Enable Model Productionalization i...
Building a MLOps Platform Around MLflow to Enable Model Productionalization i...Building a MLOps Platform Around MLflow to Enable Model Productionalization i...
Building a MLOps Platform Around MLflow to Enable Model Productionalization i...
Databricks
 
Bridging the Gap: from Data Science to Production
Bridging the Gap: from Data Science to ProductionBridging the Gap: from Data Science to Production
Bridging the Gap: from Data Science to Production
Florian Wilhelm
 
AnalyticOps: Lessons Learned Moving Machine-Learning Algorithms to Production...
AnalyticOps: Lessons Learned Moving Machine-Learning Algorithms to Production...AnalyticOps: Lessons Learned Moving Machine-Learning Algorithms to Production...
AnalyticOps: Lessons Learned Moving Machine-Learning Algorithms to Production...
Robert Grossman
 
An Introduction To Model  View  Controller In XPages
An Introduction To Model  View  Controller In XPagesAn Introduction To Model  View  Controller In XPages
An Introduction To Model  View  Controller In XPages
Ulrich Krause
 
Design Patterns for Machine Learning in Production - Sergei Izrailev, Chief D...
Design Patterns for Machine Learning in Production - Sergei Izrailev, Chief D...Design Patterns for Machine Learning in Production - Sergei Izrailev, Chief D...
Design Patterns for Machine Learning in Production - Sergei Izrailev, Chief D...
Sri Ambati
 
GeekOut 2017 - Microservices in action at the Dutch National Police
GeekOut 2017 -  Microservices in action at the Dutch National PoliceGeekOut 2017 -  Microservices in action at the Dutch National Police
GeekOut 2017 - Microservices in action at the Dutch National Police
Bert Jan Schrijver
 
MWLUG 2015 - An Introduction to MVC
MWLUG 2015 - An Introduction to MVCMWLUG 2015 - An Introduction to MVC
MWLUG 2015 - An Introduction to MVC
Ulrich Krause
 
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
DataWorks Summit
 
DevOps for AI Apps
DevOps for AI AppsDevOps for AI Apps
DevOps for AI Apps
Richin Jain
 
A survey on Machine Learning In Production (July 2018)
A survey on Machine Learning In Production (July 2018)A survey on Machine Learning In Production (July 2018)
A survey on Machine Learning In Production (July 2018)
Arnab Biswas
 
Deep dive time series anomaly detection with different Azure Data Services
Deep dive time series anomaly detection with different Azure Data ServicesDeep dive time series anomaly detection with different Azure Data Services
Deep dive time series anomaly detection with different Azure Data Services
Marco Parenzan
 
From Data Science to MLOps
From Data Science to MLOpsFrom Data Science to MLOps
From Data Science to MLOps
Carl W. Handlin
 
Strata parallel m-ml-ops_sept_2017
Strata parallel m-ml-ops_sept_2017Strata parallel m-ml-ops_sept_2017
Strata parallel m-ml-ops_sept_2017
Nisha Talagala
 
Trenowanie i wdrażanie modeli uczenia maszynowego z wykorzystaniem Google Clo...
Trenowanie i wdrażanie modeli uczenia maszynowego z wykorzystaniem Google Clo...Trenowanie i wdrażanie modeli uczenia maszynowego z wykorzystaniem Google Clo...
Trenowanie i wdrażanie modeli uczenia maszynowego z wykorzystaniem Google Clo...
Sotrender
 

Similar to Consolidating MLOps at One of Europe’s Biggest Airports (20)

DevOps for Machine Learning overview en-us
DevOps for Machine Learning overview en-usDevOps for Machine Learning overview en-us
DevOps for Machine Learning overview en-us
 
Making Data Science Scalable - 5 Lessons Learned
Making Data Science Scalable - 5 Lessons LearnedMaking Data Science Scalable - 5 Lessons Learned
Making Data Science Scalable - 5 Lessons Learned
 
Machine Learning Models: From Research to Production 6.13.18
Machine Learning Models: From Research to Production 6.13.18Machine Learning Models: From Research to Production 6.13.18
Machine Learning Models: From Research to Production 6.13.18
 
[DSC Europe 23] Petar Zecevic - ML in Production on Databricks
[DSC Europe 23] Petar Zecevic - ML in Production on Databricks[DSC Europe 23] Petar Zecevic - ML in Production on Databricks
[DSC Europe 23] Petar Zecevic - ML in Production on Databricks
 
How to use continual learning in your ML models
How to use continual learning in your ML modelsHow to use continual learning in your ML models
How to use continual learning in your ML models
 
Cnvrg webinar continual learning
Cnvrg webinar   continual learningCnvrg webinar   continual learning
Cnvrg webinar continual learning
 
Building a MLOps Platform Around MLflow to Enable Model Productionalization i...
Building a MLOps Platform Around MLflow to Enable Model Productionalization i...Building a MLOps Platform Around MLflow to Enable Model Productionalization i...
Building a MLOps Platform Around MLflow to Enable Model Productionalization i...
 
Bridging the Gap: from Data Science to Production
Bridging the Gap: from Data Science to ProductionBridging the Gap: from Data Science to Production
Bridging the Gap: from Data Science to Production
 
AnalyticOps: Lessons Learned Moving Machine-Learning Algorithms to Production...
AnalyticOps: Lessons Learned Moving Machine-Learning Algorithms to Production...AnalyticOps: Lessons Learned Moving Machine-Learning Algorithms to Production...
AnalyticOps: Lessons Learned Moving Machine-Learning Algorithms to Production...
 
An Introduction To Model  View  Controller In XPages
An Introduction To Model  View  Controller In XPagesAn Introduction To Model  View  Controller In XPages
An Introduction To Model  View  Controller In XPages
 
Design Patterns for Machine Learning in Production - Sergei Izrailev, Chief D...
Design Patterns for Machine Learning in Production - Sergei Izrailev, Chief D...Design Patterns for Machine Learning in Production - Sergei Izrailev, Chief D...
Design Patterns for Machine Learning in Production - Sergei Izrailev, Chief D...
 
GeekOut 2017 - Microservices in action at the Dutch National Police
GeekOut 2017 -  Microservices in action at the Dutch National PoliceGeekOut 2017 -  Microservices in action at the Dutch National Police
GeekOut 2017 - Microservices in action at the Dutch National Police
 
MWLUG 2015 - An Introduction to MVC
MWLUG 2015 - An Introduction to MVCMWLUG 2015 - An Introduction to MVC
MWLUG 2015 - An Introduction to MVC
 
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
 
DevOps for AI Apps
DevOps for AI AppsDevOps for AI Apps
DevOps for AI Apps
 
A survey on Machine Learning In Production (July 2018)
A survey on Machine Learning In Production (July 2018)A survey on Machine Learning In Production (July 2018)
A survey on Machine Learning In Production (July 2018)
 
Deep dive time series anomaly detection with different Azure Data Services
Deep dive time series anomaly detection with different Azure Data ServicesDeep dive time series anomaly detection with different Azure Data Services
Deep dive time series anomaly detection with different Azure Data Services
 
From Data Science to MLOps
From Data Science to MLOpsFrom Data Science to MLOps
From Data Science to MLOps
 
Strata parallel m-ml-ops_sept_2017
Strata parallel m-ml-ops_sept_2017Strata parallel m-ml-ops_sept_2017
Strata parallel m-ml-ops_sept_2017
 
Trenowanie i wdrażanie modeli uczenia maszynowego z wykorzystaniem Google Clo...
Trenowanie i wdrażanie modeli uczenia maszynowego z wykorzystaniem Google Clo...Trenowanie i wdrażanie modeli uczenia maszynowego z wykorzystaniem Google Clo...
Trenowanie i wdrażanie modeli uczenia maszynowego z wykorzystaniem Google Clo...
 

More from Databricks

Data Lakehouse Symposium | Day 1 | Part 1
Data Lakehouse Symposium | Day 1 | Part 1Data Lakehouse Symposium | Day 1 | Part 1
Data Lakehouse Symposium | Day 1 | Part 1
Databricks
 
Data Lakehouse Symposium | Day 1 | Part 2
Data Lakehouse Symposium | Day 1 | Part 2Data Lakehouse Symposium | Day 1 | Part 2
Data Lakehouse Symposium | Day 1 | Part 2
Databricks
 
Data Lakehouse Symposium | Day 2
Data Lakehouse Symposium | Day 2Data Lakehouse Symposium | Day 2
Data Lakehouse Symposium | Day 2
Databricks
 
5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop
5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop
5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop
Databricks
 
Democratizing Data Quality Through a Centralized Platform
Democratizing Data Quality Through a Centralized PlatformDemocratizing Data Quality Through a Centralized Platform
Democratizing Data Quality Through a Centralized Platform
Databricks
 
Learn to Use Databricks for Data Science
Learn to Use Databricks for Data ScienceLearn to Use Databricks for Data Science
Learn to Use Databricks for Data Science
Databricks
 
Why APM Is Not the Same As ML Monitoring
Why APM Is Not the Same As ML MonitoringWhy APM Is Not the Same As ML Monitoring
Why APM Is Not the Same As ML Monitoring
Databricks
 
The Function, the Context, and the Data—Enabling ML Ops at Stitch Fix
The Function, the Context, and the Data—Enabling ML Ops at Stitch FixThe Function, the Context, and the Data—Enabling ML Ops at Stitch Fix
The Function, the Context, and the Data—Enabling ML Ops at Stitch Fix
Databricks
 
Stage Level Scheduling Improving Big Data and AI Integration
Stage Level Scheduling Improving Big Data and AI IntegrationStage Level Scheduling Improving Big Data and AI Integration
Stage Level Scheduling Improving Big Data and AI Integration
Databricks
 
Simplify Data Conversion from Spark to TensorFlow and PyTorch
Simplify Data Conversion from Spark to TensorFlow and PyTorchSimplify Data Conversion from Spark to TensorFlow and PyTorch
Simplify Data Conversion from Spark to TensorFlow and PyTorch
Databricks
 
Scaling your Data Pipelines with Apache Spark on Kubernetes
Scaling your Data Pipelines with Apache Spark on KubernetesScaling your Data Pipelines with Apache Spark on Kubernetes
Scaling your Data Pipelines with Apache Spark on Kubernetes
Databricks
 
Scaling and Unifying SciKit Learn and Apache Spark Pipelines
Scaling and Unifying SciKit Learn and Apache Spark PipelinesScaling and Unifying SciKit Learn and Apache Spark Pipelines
Scaling and Unifying SciKit Learn and Apache Spark Pipelines
Databricks
 
Sawtooth Windows for Feature Aggregations
Sawtooth Windows for Feature AggregationsSawtooth Windows for Feature Aggregations
Sawtooth Windows for Feature Aggregations
Databricks
 
Redis + Apache Spark = Swiss Army Knife Meets Kitchen Sink
Redis + Apache Spark = Swiss Army Knife Meets Kitchen SinkRedis + Apache Spark = Swiss Army Knife Meets Kitchen Sink
Redis + Apache Spark = Swiss Army Knife Meets Kitchen Sink
Databricks
 
Raven: End-to-end Optimization of ML Prediction Queries
Raven: End-to-end Optimization of ML Prediction QueriesRaven: End-to-end Optimization of ML Prediction Queries
Raven: End-to-end Optimization of ML Prediction Queries
Databricks
 
Processing Large Datasets for ADAS Applications using Apache Spark
Processing Large Datasets for ADAS Applications using Apache SparkProcessing Large Datasets for ADAS Applications using Apache Spark
Processing Large Datasets for ADAS Applications using Apache Spark
Databricks
 
Massive Data Processing in Adobe Using Delta Lake
Massive Data Processing in Adobe Using Delta LakeMassive Data Processing in Adobe Using Delta Lake
Massive Data Processing in Adobe Using Delta Lake
Databricks
 
Machine Learning CI/CD for Email Attack Detection
Machine Learning CI/CD for Email Attack DetectionMachine Learning CI/CD for Email Attack Detection
Machine Learning CI/CD for Email Attack Detection
Databricks
 
Jeeves Grows Up: An AI Chatbot for Performance and Quality
Jeeves Grows Up: An AI Chatbot for Performance and QualityJeeves Grows Up: An AI Chatbot for Performance and Quality
Jeeves Grows Up: An AI Chatbot for Performance and Quality
Databricks
 
Intuitive & Scalable Hyperparameter Tuning with Apache Spark + Fugue
Intuitive & Scalable Hyperparameter Tuning with Apache Spark + FugueIntuitive & Scalable Hyperparameter Tuning with Apache Spark + Fugue
Intuitive & Scalable Hyperparameter Tuning with Apache Spark + Fugue
Databricks
 

More from Databricks (20)

Data Lakehouse Symposium | Day 1 | Part 1
Data Lakehouse Symposium | Day 1 | Part 1Data Lakehouse Symposium | Day 1 | Part 1
Data Lakehouse Symposium | Day 1 | Part 1
 
Data Lakehouse Symposium | Day 1 | Part 2
Data Lakehouse Symposium | Day 1 | Part 2Data Lakehouse Symposium | Day 1 | Part 2
Data Lakehouse Symposium | Day 1 | Part 2
 
Data Lakehouse Symposium | Day 2
Data Lakehouse Symposium | Day 2Data Lakehouse Symposium | Day 2
Data Lakehouse Symposium | Day 2
 
5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop
5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop
5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop
 
Democratizing Data Quality Through a Centralized Platform
Democratizing Data Quality Through a Centralized PlatformDemocratizing Data Quality Through a Centralized Platform
Democratizing Data Quality Through a Centralized Platform
 
Learn to Use Databricks for Data Science
Learn to Use Databricks for Data ScienceLearn to Use Databricks for Data Science
Learn to Use Databricks for Data Science
 
Why APM Is Not the Same As ML Monitoring
Why APM Is Not the Same As ML MonitoringWhy APM Is Not the Same As ML Monitoring
Why APM Is Not the Same As ML Monitoring
 
The Function, the Context, and the Data—Enabling ML Ops at Stitch Fix
The Function, the Context, and the Data—Enabling ML Ops at Stitch FixThe Function, the Context, and the Data—Enabling ML Ops at Stitch Fix
The Function, the Context, and the Data—Enabling ML Ops at Stitch Fix
 
Stage Level Scheduling Improving Big Data and AI Integration
Stage Level Scheduling Improving Big Data and AI IntegrationStage Level Scheduling Improving Big Data and AI Integration
Stage Level Scheduling Improving Big Data and AI Integration
 
Simplify Data Conversion from Spark to TensorFlow and PyTorch
Simplify Data Conversion from Spark to TensorFlow and PyTorchSimplify Data Conversion from Spark to TensorFlow and PyTorch
Simplify Data Conversion from Spark to TensorFlow and PyTorch
 
Scaling your Data Pipelines with Apache Spark on Kubernetes
Scaling your Data Pipelines with Apache Spark on KubernetesScaling your Data Pipelines with Apache Spark on Kubernetes
Scaling your Data Pipelines with Apache Spark on Kubernetes
 
Scaling and Unifying SciKit Learn and Apache Spark Pipelines
Scaling and Unifying SciKit Learn and Apache Spark PipelinesScaling and Unifying SciKit Learn and Apache Spark Pipelines
Scaling and Unifying SciKit Learn and Apache Spark Pipelines
 
Sawtooth Windows for Feature Aggregations
Sawtooth Windows for Feature AggregationsSawtooth Windows for Feature Aggregations
Sawtooth Windows for Feature Aggregations
 
Redis + Apache Spark = Swiss Army Knife Meets Kitchen Sink
Redis + Apache Spark = Swiss Army Knife Meets Kitchen SinkRedis + Apache Spark = Swiss Army Knife Meets Kitchen Sink
Redis + Apache Spark = Swiss Army Knife Meets Kitchen Sink
 
Raven: End-to-end Optimization of ML Prediction Queries
Raven: End-to-end Optimization of ML Prediction QueriesRaven: End-to-end Optimization of ML Prediction Queries
Raven: End-to-end Optimization of ML Prediction Queries
 
Processing Large Datasets for ADAS Applications using Apache Spark
Processing Large Datasets for ADAS Applications using Apache SparkProcessing Large Datasets for ADAS Applications using Apache Spark
Processing Large Datasets for ADAS Applications using Apache Spark
 
Massive Data Processing in Adobe Using Delta Lake
Massive Data Processing in Adobe Using Delta LakeMassive Data Processing in Adobe Using Delta Lake
Massive Data Processing in Adobe Using Delta Lake
 
Machine Learning CI/CD for Email Attack Detection
Machine Learning CI/CD for Email Attack DetectionMachine Learning CI/CD for Email Attack Detection
Machine Learning CI/CD for Email Attack Detection
 
Jeeves Grows Up: An AI Chatbot for Performance and Quality
Jeeves Grows Up: An AI Chatbot for Performance and QualityJeeves Grows Up: An AI Chatbot for Performance and Quality
Jeeves Grows Up: An AI Chatbot for Performance and Quality
 
Intuitive & Scalable Hyperparameter Tuning with Apache Spark + Fugue
Intuitive & Scalable Hyperparameter Tuning with Apache Spark + FugueIntuitive & Scalable Hyperparameter Tuning with Apache Spark + Fugue
Intuitive & Scalable Hyperparameter Tuning with Apache Spark + Fugue
 

Recently uploaded

Criminal IP - Threat Hunting Webinar.pdf
Criminal IP - Threat Hunting Webinar.pdfCriminal IP - Threat Hunting Webinar.pdf
Criminal IP - Threat Hunting Webinar.pdf
Criminal IP
 
Q1’2024 Update: MYCI’s Leap Year Rebound
Q1’2024 Update: MYCI’s Leap Year ReboundQ1’2024 Update: MYCI’s Leap Year Rebound
Q1’2024 Update: MYCI’s Leap Year Rebound
Oppotus
 
一比一原版(QU毕业证)皇后大学毕业证成绩单
一比一原版(QU毕业证)皇后大学毕业证成绩单一比一原版(QU毕业证)皇后大学毕业证成绩单
一比一原版(QU毕业证)皇后大学毕业证成绩单
enxupq
 
Tabula.io Cheatsheet: automate your data workflows
Tabula.io Cheatsheet: automate your data workflowsTabula.io Cheatsheet: automate your data workflows
Tabula.io Cheatsheet: automate your data workflows
alex933524
 
一比一原版(UMich毕业证)密歇根大学|安娜堡分校毕业证成绩单
一比一原版(UMich毕业证)密歇根大学|安娜堡分校毕业证成绩单一比一原版(UMich毕业证)密歇根大学|安娜堡分校毕业证成绩单
一比一原版(UMich毕业证)密歇根大学|安娜堡分校毕业证成绩单
ewymefz
 
SOCRadar Germany 2024 Threat Landscape Report
SOCRadar Germany 2024 Threat Landscape ReportSOCRadar Germany 2024 Threat Landscape Report
SOCRadar Germany 2024 Threat Landscape Report
SOCRadar
 
一比一原版(CU毕业证)卡尔顿大学毕业证成绩单
一比一原版(CU毕业证)卡尔顿大学毕业证成绩单一比一原版(CU毕业证)卡尔顿大学毕业证成绩单
一比一原版(CU毕业证)卡尔顿大学毕业证成绩单
yhkoc
 
The affect of service quality and online reviews on customer loyalty in the E...
The affect of service quality and online reviews on customer loyalty in the E...The affect of service quality and online reviews on customer loyalty in the E...
The affect of service quality and online reviews on customer loyalty in the E...
jerlynmaetalle
 
Algorithmic optimizations for Dynamic Levelwise PageRank (from STICD) : SHORT...
Algorithmic optimizations for Dynamic Levelwise PageRank (from STICD) : SHORT...Algorithmic optimizations for Dynamic Levelwise PageRank (from STICD) : SHORT...
Algorithmic optimizations for Dynamic Levelwise PageRank (from STICD) : SHORT...
Subhajit Sahu
 
Investigate & Recover / StarCompliance.io / Crypto_Crimes
Investigate & Recover / StarCompliance.io / Crypto_CrimesInvestigate & Recover / StarCompliance.io / Crypto_Crimes
Investigate & Recover / StarCompliance.io / Crypto_Crimes
StarCompliance.io
 
Best best suvichar in gujarati english meaning of this sentence as Silk road ...
Best best suvichar in gujarati english meaning of this sentence as Silk road ...Best best suvichar in gujarati english meaning of this sentence as Silk road ...
Best best suvichar in gujarati english meaning of this sentence as Silk road ...
AbhimanyuSinha9
 
社内勉強会資料_LLM Agents                              .
社内勉強会資料_LLM Agents                              .社内勉強会資料_LLM Agents                              .
社内勉強会資料_LLM Agents                              .
NABLAS株式会社
 
一比一原版(UPenn毕业证)宾夕法尼亚大学毕业证成绩单
一比一原版(UPenn毕业证)宾夕法尼亚大学毕业证成绩单一比一原版(UPenn毕业证)宾夕法尼亚大学毕业证成绩单
一比一原版(UPenn毕业证)宾夕法尼亚大学毕业证成绩单
ewymefz
 
Empowering Data Analytics Ecosystem.pptx
Empowering Data Analytics Ecosystem.pptxEmpowering Data Analytics Ecosystem.pptx
Empowering Data Analytics Ecosystem.pptx
benishzehra469
 
tapal brand analysis PPT slide for comptetive data
tapal brand analysis PPT slide for comptetive datatapal brand analysis PPT slide for comptetive data
tapal brand analysis PPT slide for comptetive data
theahmadsaood
 
一比一原版(CBU毕业证)卡普顿大学毕业证成绩单
一比一原版(CBU毕业证)卡普顿大学毕业证成绩单一比一原版(CBU毕业证)卡普顿大学毕业证成绩单
一比一原版(CBU毕业证)卡普顿大学毕业证成绩单
nscud
 
Opendatabay - Open Data Marketplace.pptx
Opendatabay - Open Data Marketplace.pptxOpendatabay - Open Data Marketplace.pptx
Opendatabay - Open Data Marketplace.pptx
Opendatabay
 
1.Seydhcuxhxyxhccuuxuxyxyxmisolids 2019.pptx
1.Seydhcuxhxyxhccuuxuxyxyxmisolids 2019.pptx1.Seydhcuxhxyxhccuuxuxyxyxmisolids 2019.pptx
1.Seydhcuxhxyxhccuuxuxyxyxmisolids 2019.pptx
Tiktokethiodaily
 
做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样
做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样
做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样
axoqas
 
Sample_Global Non-invasive Prenatal Testing (NIPT) Market, 2019-2030.pdf
Sample_Global Non-invasive Prenatal Testing (NIPT) Market, 2019-2030.pdfSample_Global Non-invasive Prenatal Testing (NIPT) Market, 2019-2030.pdf
Sample_Global Non-invasive Prenatal Testing (NIPT) Market, 2019-2030.pdf
Linda486226
 

Recently uploaded (20)

Criminal IP - Threat Hunting Webinar.pdf
Criminal IP - Threat Hunting Webinar.pdfCriminal IP - Threat Hunting Webinar.pdf
Criminal IP - Threat Hunting Webinar.pdf
 
Q1’2024 Update: MYCI’s Leap Year Rebound
Q1’2024 Update: MYCI’s Leap Year ReboundQ1’2024 Update: MYCI’s Leap Year Rebound
Q1’2024 Update: MYCI’s Leap Year Rebound
 
一比一原版(QU毕业证)皇后大学毕业证成绩单
一比一原版(QU毕业证)皇后大学毕业证成绩单一比一原版(QU毕业证)皇后大学毕业证成绩单
一比一原版(QU毕业证)皇后大学毕业证成绩单
 
Tabula.io Cheatsheet: automate your data workflows
Tabula.io Cheatsheet: automate your data workflowsTabula.io Cheatsheet: automate your data workflows
Tabula.io Cheatsheet: automate your data workflows
 
一比一原版(UMich毕业证)密歇根大学|安娜堡分校毕业证成绩单
一比一原版(UMich毕业证)密歇根大学|安娜堡分校毕业证成绩单一比一原版(UMich毕业证)密歇根大学|安娜堡分校毕业证成绩单
一比一原版(UMich毕业证)密歇根大学|安娜堡分校毕业证成绩单
 
SOCRadar Germany 2024 Threat Landscape Report
SOCRadar Germany 2024 Threat Landscape ReportSOCRadar Germany 2024 Threat Landscape Report
SOCRadar Germany 2024 Threat Landscape Report
 
一比一原版(CU毕业证)卡尔顿大学毕业证成绩单
一比一原版(CU毕业证)卡尔顿大学毕业证成绩单一比一原版(CU毕业证)卡尔顿大学毕业证成绩单
一比一原版(CU毕业证)卡尔顿大学毕业证成绩单
 
The affect of service quality and online reviews on customer loyalty in the E...
The affect of service quality and online reviews on customer loyalty in the E...The affect of service quality and online reviews on customer loyalty in the E...
The affect of service quality and online reviews on customer loyalty in the E...
 
Algorithmic optimizations for Dynamic Levelwise PageRank (from STICD) : SHORT...
Algorithmic optimizations for Dynamic Levelwise PageRank (from STICD) : SHORT...Algorithmic optimizations for Dynamic Levelwise PageRank (from STICD) : SHORT...
Algorithmic optimizations for Dynamic Levelwise PageRank (from STICD) : SHORT...
 
Investigate & Recover / StarCompliance.io / Crypto_Crimes
Investigate & Recover / StarCompliance.io / Crypto_CrimesInvestigate & Recover / StarCompliance.io / Crypto_Crimes
Investigate & Recover / StarCompliance.io / Crypto_Crimes
 
Best best suvichar in gujarati english meaning of this sentence as Silk road ...
Best best suvichar in gujarati english meaning of this sentence as Silk road ...Best best suvichar in gujarati english meaning of this sentence as Silk road ...
Best best suvichar in gujarati english meaning of this sentence as Silk road ...
 
社内勉強会資料_LLM Agents                              .
社内勉強会資料_LLM Agents                              .社内勉強会資料_LLM Agents                              .
社内勉強会資料_LLM Agents                              .
 
一比一原版(UPenn毕业证)宾夕法尼亚大学毕业证成绩单
一比一原版(UPenn毕业证)宾夕法尼亚大学毕业证成绩单一比一原版(UPenn毕业证)宾夕法尼亚大学毕业证成绩单
一比一原版(UPenn毕业证)宾夕法尼亚大学毕业证成绩单
 
Empowering Data Analytics Ecosystem.pptx
Empowering Data Analytics Ecosystem.pptxEmpowering Data Analytics Ecosystem.pptx
Empowering Data Analytics Ecosystem.pptx
 
tapal brand analysis PPT slide for comptetive data
tapal brand analysis PPT slide for comptetive datatapal brand analysis PPT slide for comptetive data
tapal brand analysis PPT slide for comptetive data
 
一比一原版(CBU毕业证)卡普顿大学毕业证成绩单
一比一原版(CBU毕业证)卡普顿大学毕业证成绩单一比一原版(CBU毕业证)卡普顿大学毕业证成绩单
一比一原版(CBU毕业证)卡普顿大学毕业证成绩单
 
Opendatabay - Open Data Marketplace.pptx
Opendatabay - Open Data Marketplace.pptxOpendatabay - Open Data Marketplace.pptx
Opendatabay - Open Data Marketplace.pptx
 
1.Seydhcuxhxyxhccuuxuxyxyxmisolids 2019.pptx
1.Seydhcuxhxyxhccuuxuxyxyxmisolids 2019.pptx1.Seydhcuxhxyxhccuuxuxyxyxmisolids 2019.pptx
1.Seydhcuxhxyxhccuuxuxyxyxmisolids 2019.pptx
 
做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样
做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样
做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样
 
Sample_Global Non-invasive Prenatal Testing (NIPT) Market, 2019-2030.pdf
Sample_Global Non-invasive Prenatal Testing (NIPT) Market, 2019-2030.pdfSample_Global Non-invasive Prenatal Testing (NIPT) Market, 2019-2030.pdf
Sample_Global Non-invasive Prenatal Testing (NIPT) Market, 2019-2030.pdf
 

Consolidating MLOps at One of Europe’s Biggest Airports

  • 1. Consolidating MLOps at Schiphol Airport Floris Hoogenboom – Lead Data Scientist (Floris.Hoogenboom@schiphol.nl) Sebastiaan Grasdijk - Senior Data Scientist (Sebastiaan.Grasdijk@Schiphol.nl)
  • 2. Introduction • Amsterdam Airport • Before Covid: Europe’s third airport • Approx 500.000 ATMs • 72 Milion PAX in 2019 • Royal Schiphol Group • Schiphol Digital
  • 3. Introduction • Block time predictions • Gate Planning • Passenger Flow • Covid-19 Safety • Environmental noise • Smart Buildings • Security Lane planning • TAF forecasts • Turnaround Insights XX123 XX123 XX123 XX123 XX123 XX123 XX123 XX123 ACREG ACREG ACREG ACREG ACREG
  • 4. Introduction • Block time predictions • Gate Planning • Passenger Flow • Covid-19 Safety • Environmental noise • Smart Buildings • Security Lane planning • TAF forecasts • Turnaround Insights 000 ACREG 9999 000
  • 5. Introduction • Block time predictions • Gate Planning • Passenger Flow • Covid-19 Safety • Environmental noise • Smart Buildings • Security Lane planning • TAF forecasts • Turnaround Insights
  • 6. Introduction • Block time predictions • Gate Planning • Passenger Flow • Covid-19 Safety • Environmental noise • Smart Buildings • Security Lane planning • TAF forecasts • Turnaround Insights XX123 XX123 ACREG
  • 7. “Show how we implemented MLOps and how that enables us to keep applying ML in a constantly changing environment.” • Motivation • Our MLFlow training set-up • Bringing a trained model to production • Monitoring
  • 8. • Schiphol is a very dynamic place to apply AI in Everyday some physical aspect of the Airport changes meaning that dynamics of e.g. PAX flow will be different. • Most things we capture in our models, but some things we are not able to. Sometimes we don’t know works will occur, sometimes long term incidents happen we hadn't foreseen and we quickly want to adapt our models to. • Keeping track and monitoring our models in production was always a big task already • We often released updates to our models, e.g. including new data sources, deprecating temporarily unavailable feeds etc. to make sure we always had the best quality.
  • 9.
  • 10. • Quite standard • Have a very strict format for all of our models: • Python package containing (1) library code (2) training application and (3) inference application • Training just entails installing the package and referencing a fixed entrypoint that is everywhere the same
  • 11.
  • 12. • Machine learning deals with data • There is only one type of data that matters for modelling: PRD data • Lots of organizations use the engineering DTAP flow where scientists work on "DEV" to train their models • This works if it's their DEV and not also some engineer's DEV
  • 13. • Three types of models we deploy: • Batch (e.g. Block Time Prediction) -> Databricks Job • Streaming (e.g. Bagage time on Belt) -> Databricks Job • Request/Reply (e.g. The forecasted disturbance at a given location) -> API in kubernetes • Our way of integrating models in each of those deployments is more or less the same • Focus on Batch for the rest of the talk
  • 14.
  • 15. • Cross environment dependencies • Runtime dependencies (mlflow.load_model only executed when running the job) • Stability assumptions on your inference & model codebase • "non-atomic" deployments: it is hard to keep track of exactly what is running where Dive into these points before showing how we resolved this.
  • 16. • There is a discrepancy between a "model" in the deployment sense and a "model" in the data science sense • Models come with an interface that specifies: • The features that should go in • (implicitly) the data distribution of those features • Deploying a model means deploying: • The trained artifact • Any code that is needed to do preprocessing/fetch queries to fetch data from a datasource etc. • These cannot be decoupled! (!!) Is this always a big problem? No Some models have a very stable API (e.g. computer vision models).
  • 17. • Not every model can be deployed with every version of your inference code • You need to ensure that they are "feature compatible" • This makes the Model registry UI a bit dangerous New release that dropped a few features Old release that still used those features What if we want to revert?
  • 18. • There are two version identifiers that determine the actual prediction job that will run • This is hard to reason about, debug, log and manage • Having a single source of truth makes it possible to know what is running where and how to revert
  • 19.
  • 20. • Data Scientist adapts the codebase to train a new model. • Stores changes in Git • Uses mlflow run to kick of a new mlflow run on databricks that logs the new run to some experiment. • Data Scientist judges the quality of the experiment and decides whether this is good enough for review
  • 21. • Data Scientist creates an MR on the repo to merge: • The code for training the new model • The adapted inference code such that it matches with the model • The configuration files for the deployed model (!) • Unittests, linting etc. Runs • Then the interesting part starts.....
  • 22. • CI Fetches the model from the MLFlow experiment based on the specified Run ID • CI "builds" the deployment artifact which contains • The model we wish to deploy • The inference code you need to run it • This creates a single artifact that can be deployed without any runtime dependencies!
  • 23. • Deploy the created deployment artifact • As a databricks job • As a docker container • Etc.. • Environment just based on Git Tags • Keep track of your environments like you would do traditionally
  • 24. • We do still use the model registry! • The model registry is managed from the CI pipeline • We use the following stages: • On feature branch deployments: register a new version in the registry if it does not exist yet • On master: promote model to staging • On tags: promote model to production
  • 25. • We use airflow for scheduling automated retraining • We don't automatically "update" models in production based on retraining • Rather, we take away the manual process of starting a run etc., but the decision to go live is always up to a data scientist.
  • 26. • We use airflow for scheduling automated retraining • We don't automatically "update" models in production based on retraining • Rather, we take away the manual process of starting a run etc., but the decision to go live is always up to a data scientist.
  • 27. • Metrics get logged to Datadog • Anomaly monitoring and warnings are sent to a slack channel • We use notebooks to dive into any anomalies we see
  • 28. • Data Scientist can deploy models without any support • We release new versions of many of our models every week • Not only by training on new data, but also by adding features, changing data fetching etc. • Fully versioned with a single source of truth • If it works on DEV, it will work on ACP and PRD because of the single deployment package • Easy to revert if something breaks
  • 29. • MLFlow is a great tool, but it is not a click & go solution always • Feature compatibility is an important issue to keep in mind, your model is much more than just your algorithm • Having a single source of truth, makes managing models much more like managing traditional software • Having a proper MLOps flow enables speed in getting ML to production
  • 31. Feedback Your feedback is important to us. Don’t forget to rate and review the sessions.