SlideShare a Scribd company logo
1 of 37
inteligencija.com
Machine Learning in
Production on Databricks
Petar Zečević
Senior Principal Consultant
Poslovna inteligencija, Zagreb, Croatia
inteligencija.com
Agenda
• Why is productionising Machine Learning hard?
• Overview of the Machine Learning lifecycle best practices
• Overview of how Databricks solves Machine Learning
productionisation
inteligencija.com
ML lifecycle challenges
inteligencija.com
ML lifecycle – the naive version
Streaming
data?
Is the data
fresh?
Schema
changes?
ETL code
versioning?
Has the data
distribution
changed?
ETL
testing?
What
parameters
and algos
worked?
Is the
model
performanc
e still OK?
Which
environment
?
Can the
environment
be
reproduced?
Which data
was used
for
training?
Preparation
code
versioning
and
testing?
Are all the
features
really
needed?
Are
features in
the prod
equivalent
to the ones
from
training?
Is the whole
pipeline
integration
tested?
inteligencija.com
How is ML development different?
„The ML Test Score: A Rubric for ML Production Readiness and
Technical Debt Reduction”, Breck et al., Google 2017
inteligencija.com
Getting to production – the ML development process
„Hidden Technical Debt in Machine Learning Systems”, Sculley et al.,
Google 2014
inteligencija.com
What is MLOps?
MLOps = DevOps + DataOps + ModelOp
inteligencija.com
What is DevOps?
Image source: Wikipedia
• Application code versioning
• Continuous integration – CI
• Continuous deployment – CD
• Automated testing
• Infrastructure as code
• Configuration management
• Monitoring
inteligencija.com
What is DataOps?
Image source: Monte Carlo Data
• ETL/ELT pipelines
• Code versioning
• Data lineage
• Data testing
• Data privacy
• Data self service
• Feature engineering
inteligencija.com
What is ModelOps?
Image source: Aksel Yap
• Feature engineering
• Tracking experiments
• Model validation and testing
• Versioning of model code
• Apply models to real-life data (deployment)
• Model performance monitoring
inteligencija.com
Deployment paradigms
• Batch – most of the applications
• Streaming – latency in seconds and minutes
• Real time – latency in <1s
• Edge (on-device) – specially tuned models
inteligencija.com
Getting ML to production:
MLOps best practices
inteligencija.com
Data best practices
• Version control data pipeline code
• Document feature expectations and automate data quality
checks
• Design for reusable and modular data pipelines
• Test feature creation and data processing code
• Use a Feature Store to ensure that features are consistent
across different models and pipelines
• Adopt CI/CD for data pipelines
• Adopt Infrastructure as Code
• Beware sensitive data and compliance
• Training/serving skew – Check that training and serving
features are computed in the same way (a.k.a online/offline
skew)
inteligencija.com
Model best practices
• Version control model training code and track experiments
• Model testing:
• Check for feature usefulness and cost
• Tune all hyperparameters
• Compare models to simpler alternatives – sanity check
• Test performance on important subsets of data (e.g.
regions)
• Understand the real-world impact of the model outputs
• Use canary deployments and A/B testing in production
• Have a rollback strategy
• Monitor for model degradation in production
• Understand how fast the model goes stale
• Set up automatic retraining pipelines (continuous learning)
inteligencija.com
How does Databricks help?
inteligencija.com
inteligencija.com
Delta Lake
• ACID transactions – ensures data consistency and reliability
• Schema enforcement and evolution – helps with data quality
• Time travel (Data versioning) – facilitates experimentation
• Deletes and upserts (MERGE INTO) – iterative and incremental
feature preparation
• Data skipping and other optimizations – improves
performance
inteligencija.com
Delta Live Tables
• Framework for building data processing pipelines
• You define transformations and DLT manages:
• Orchestration
• Cluster management
• Monitoring
• Data quality (Expectations)
• Error handling
• Can perform CDC with APPLY CHANGES INTO .. FROM ..
inteligencija.com
Delta Live Tables
inteligencija.com
Unity Catalog
• Centralized data discovery and access – quick search for and
reuse of existing datasets
• Centralized data governance and security – Fine-grained
access control management from a central location
• Data lineage – tables, columns, notebooks, workflows and
dashboards provide automatically collected lineage information
• Collaboration – Cross-workspace sharing enables teams to
share datasets across projects without data duplication,
promoting consistent use
inteligencija.com
inteligencija.com
inteligencija.com
Feature Store
• Centralized Feature Management – discoverable and reusable
• Any table in Unity Catalog can serve as a feature table (since
DBR 13.2)
• Lineage – upstream and downstream
• Consistency Across Models – Features used for training
models are also served in production
• Simplified Serving – models include feature metadata
• Should be used consistently (log_model) – so that you can
keep track of feature usage
• You can publish features to online stores (Amazon
DynamoDB, Aurora or RDS MySQL)
• for models served with Databricks Model Serving
inteligencija.com
MLflow
Integrated within the Databricks platform (notebooks and
workflows):
• MLflow Tracking – Log and query experiments and runs in
terms of code, data, config, and results
• MLflow Projects – Package data science code in a reusable,
reproducible form to share with other data scientists or transfer
to production.
• MLflow Models – Manage and deploy machine learning models
from a variety of ML libraries to a variety of model serving and
inference platforms.
• MLflow Model Registry – A centralized model store, set of APIs,
and UI, to collaboratively manage the lifecycle of a MLflow
Model.
inteligencija.com
inteligencija.com
MLflow Model Registry
central model registry vs. per-workspace model registry
inteligencija.com
AutoML
• Generates ML classification, regression or forecasting code
automatically, based on input table and the target field
• Features from the Feature Store can be joined
• Jupyter notebooks with code for splitting data, setting up
libraries etc.
• Provides a good starting point for experiments and/or models
ready to be registered
inteligencija.com
CI/CD integration
• Databricks Repos UI is used for checking out Git branches,
merging and pushing changes
• It provides a REST API that can be invoked by Git automation
• In production you can:
• directly reference notebooks in remote Git repos by tags or
branches
• set up read-only folders with checked-out repos and update
them automatically using Git automation
• MLflow Model Registry provides an API so that Git automation
can automatically transition models between environments
inteligencija.com
Moving to production on
Databricks
inteligencija.com
Execution environments
Different environments, such as dev, staging and prod can be separate
• Multiple cloud accounts
• Multiple Databricks workspaces – within a single cloud account
• Databricks workspace access controls
inteligencija.com
Promoting code and models
Use Git branches to separate code versions:
• dev branch for development
• specific feature branches for feature development
• release branches for different versions
Lifecycle of models might be independent of code
inteligencija.com
Promoting code and models
Image source: Databricks
Promoting models and code across environments:
inteligencija.com
The recommended approach for model promotion
The workflow recommended by Databricks:
• Dev environment:
• Develop training and other code
• Promote code
• Staging environment:
• Test training code on subset of data
• Test other code
• Promote code
• Prod environment:
• Train model on production data
• Test model
• Deploy model
• Deploy code
inteligencija.com
Links & Resources
• Big Book of MLOps – Databricks
• The Comprehensive Guide to Feature Stores – Databricks
• ML in Production – Databricks course
• https://github.com/databricks/mlops-stacks
• https://ml-ops.org/
• https://cloud.google.com/architecture/mlops-continuous-
delivery-and-automation-pipelines-in-machine-learning
inteligencija.com
Thank you!
inteligencija.com
We are Data & Analytics consulting company committed to deliver great solutions and products that
enables our clients to unlock hidden opportunities within data, become data-driven and make better
business decisions
Our goal is to enable data-driven business decisions
Offices in UK,
Sweden,
Austria,
Slovenia and
Croatia
180+
employees
20 years in
Data &
Analytics
250+
projects
100+
clients on 5
continents
inteligencija.com

More Related Content

Similar to [DSC Europe 23] Petar Zecevic - ML in Production on Databricks

Machine Learning Platform Life-Cycle Management
Machine Learning Platform Life-Cycle ManagementMachine Learning Platform Life-Cycle Management
Machine Learning Platform Life-Cycle ManagementBill Liu
 
Trenowanie i wdrażanie modeli uczenia maszynowego z wykorzystaniem Google Clo...
Trenowanie i wdrażanie modeli uczenia maszynowego z wykorzystaniem Google Clo...Trenowanie i wdrażanie modeli uczenia maszynowego z wykorzystaniem Google Clo...
Trenowanie i wdrażanie modeli uczenia maszynowego z wykorzystaniem Google Clo...Sotrender
 
A survey on Machine Learning In Production (July 2018)
A survey on Machine Learning In Production (July 2018)A survey on Machine Learning In Production (July 2018)
A survey on Machine Learning In Production (July 2018)Arnab Biswas
 
Incquery Suite Models 2020 Conference by István Ráth, CEO of IncQuery Labs
Incquery Suite Models 2020 Conference by István Ráth, CEO of IncQuery LabsIncquery Suite Models 2020 Conference by István Ráth, CEO of IncQuery Labs
Incquery Suite Models 2020 Conference by István Ráth, CEO of IncQuery LabsIncQuery Labs
 
IncQuery Group's presentation for the INCOSE Polish Chapter 20220310
IncQuery Group's presentation for the INCOSE Polish Chapter 20220310IncQuery Group's presentation for the INCOSE Polish Chapter 20220310
IncQuery Group's presentation for the INCOSE Polish Chapter 20220310IncQuery Labs
 
Building a MLOps Platform Around MLflow to Enable Model Productionalization i...
Building a MLOps Platform Around MLflow to Enable Model Productionalization i...Building a MLOps Platform Around MLflow to Enable Model Productionalization i...
Building a MLOps Platform Around MLflow to Enable Model Productionalization i...Databricks
 
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...DataWorks Summit
 
[DSC Europe 23] Milos Grubjesic Empowering Business with Pepsico s Advanced M...
[DSC Europe 23] Milos Grubjesic Empowering Business with Pepsico s Advanced M...[DSC Europe 23] Milos Grubjesic Empowering Business with Pepsico s Advanced M...
[DSC Europe 23] Milos Grubjesic Empowering Business with Pepsico s Advanced M...DataScienceConferenc1
 
MLOps Virtual Event | Building Machine Learning Platforms for the Full Lifecycle
MLOps Virtual Event | Building Machine Learning Platforms for the Full LifecycleMLOps Virtual Event | Building Machine Learning Platforms for the Full Lifecycle
MLOps Virtual Event | Building Machine Learning Platforms for the Full LifecycleDatabricks
 
Tutorial Expert How-To - Command Line Interface (CLI)
Tutorial Expert How-To - Command Line Interface (CLI)Tutorial Expert How-To - Command Line Interface (CLI)
Tutorial Expert How-To - Command Line Interface (CLI)PascalDesmarets1
 
A practical guidance of the enterprise machine learning
A practical guidance of the enterprise machine learning A practical guidance of the enterprise machine learning
A practical guidance of the enterprise machine learning Jesus Rodriguez
 
201908 Overview of Automated ML
201908 Overview of Automated ML201908 Overview of Automated ML
201908 Overview of Automated MLMark Tabladillo
 
From Data Science to MLOps
From Data Science to MLOpsFrom Data Science to MLOps
From Data Science to MLOpsCarl W. Handlin
 
Consolidating MLOps at One of Europe’s Biggest Airports
Consolidating MLOps at One of Europe’s Biggest AirportsConsolidating MLOps at One of Europe’s Biggest Airports
Consolidating MLOps at One of Europe’s Biggest AirportsDatabricks
 
Machine Learning and AI
Machine Learning and AIMachine Learning and AI
Machine Learning and AIJames Serra
 
Paige Roberts: Shortcut MLOps with In-Database Machine Learning
Paige Roberts: Shortcut MLOps with In-Database Machine LearningPaige Roberts: Shortcut MLOps with In-Database Machine Learning
Paige Roberts: Shortcut MLOps with In-Database Machine LearningEdunomica
 
Aditya Bhattacharya - Enterprise DL - Accelerating Deep Learning Solutions to...
Aditya Bhattacharya - Enterprise DL - Accelerating Deep Learning Solutions to...Aditya Bhattacharya - Enterprise DL - Accelerating Deep Learning Solutions to...
Aditya Bhattacharya - Enterprise DL - Accelerating Deep Learning Solutions to...Aditya Bhattacharya
 
Building a Scalable and reliable open source ML Platform with MLFlow
Building a Scalable and reliable open source ML Platform with MLFlowBuilding a Scalable and reliable open source ML Platform with MLFlow
Building a Scalable and reliable open source ML Platform with MLFlowGoDataDriven
 

Similar to [DSC Europe 23] Petar Zecevic - ML in Production on Databricks (20)

Machine Learning Platform Life-Cycle Management
Machine Learning Platform Life-Cycle ManagementMachine Learning Platform Life-Cycle Management
Machine Learning Platform Life-Cycle Management
 
Trenowanie i wdrażanie modeli uczenia maszynowego z wykorzystaniem Google Clo...
Trenowanie i wdrażanie modeli uczenia maszynowego z wykorzystaniem Google Clo...Trenowanie i wdrażanie modeli uczenia maszynowego z wykorzystaniem Google Clo...
Trenowanie i wdrażanie modeli uczenia maszynowego z wykorzystaniem Google Clo...
 
A survey on Machine Learning In Production (July 2018)
A survey on Machine Learning In Production (July 2018)A survey on Machine Learning In Production (July 2018)
A survey on Machine Learning In Production (July 2018)
 
Incquery Suite Models 2020 Conference by István Ráth, CEO of IncQuery Labs
Incquery Suite Models 2020 Conference by István Ráth, CEO of IncQuery LabsIncquery Suite Models 2020 Conference by István Ráth, CEO of IncQuery Labs
Incquery Suite Models 2020 Conference by István Ráth, CEO of IncQuery Labs
 
IncQuery Group's presentation for the INCOSE Polish Chapter 20220310
IncQuery Group's presentation for the INCOSE Polish Chapter 20220310IncQuery Group's presentation for the INCOSE Polish Chapter 20220310
IncQuery Group's presentation for the INCOSE Polish Chapter 20220310
 
Building a MLOps Platform Around MLflow to Enable Model Productionalization i...
Building a MLOps Platform Around MLflow to Enable Model Productionalization i...Building a MLOps Platform Around MLflow to Enable Model Productionalization i...
Building a MLOps Platform Around MLflow to Enable Model Productionalization i...
 
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
 
[DSC Europe 23] Milos Grubjesic Empowering Business with Pepsico s Advanced M...
[DSC Europe 23] Milos Grubjesic Empowering Business with Pepsico s Advanced M...[DSC Europe 23] Milos Grubjesic Empowering Business with Pepsico s Advanced M...
[DSC Europe 23] Milos Grubjesic Empowering Business with Pepsico s Advanced M...
 
MLOps Virtual Event | Building Machine Learning Platforms for the Full Lifecycle
MLOps Virtual Event | Building Machine Learning Platforms for the Full LifecycleMLOps Virtual Event | Building Machine Learning Platforms for the Full Lifecycle
MLOps Virtual Event | Building Machine Learning Platforms for the Full Lifecycle
 
Tutorial Expert How-To - Command Line Interface (CLI)
Tutorial Expert How-To - Command Line Interface (CLI)Tutorial Expert How-To - Command Line Interface (CLI)
Tutorial Expert How-To - Command Line Interface (CLI)
 
A practical guidance of the enterprise machine learning
A practical guidance of the enterprise machine learning A practical guidance of the enterprise machine learning
A practical guidance of the enterprise machine learning
 
201908 Overview of Automated ML
201908 Overview of Automated ML201908 Overview of Automated ML
201908 Overview of Automated ML
 
From Data Science to MLOps
From Data Science to MLOpsFrom Data Science to MLOps
From Data Science to MLOps
 
MLOps in action
MLOps in actionMLOps in action
MLOps in action
 
Consolidating MLOps at One of Europe’s Biggest Airports
Consolidating MLOps at One of Europe’s Biggest AirportsConsolidating MLOps at One of Europe’s Biggest Airports
Consolidating MLOps at One of Europe’s Biggest Airports
 
Machine Learning and AI
Machine Learning and AIMachine Learning and AI
Machine Learning and AI
 
Paige Roberts: Shortcut MLOps with In-Database Machine Learning
Paige Roberts: Shortcut MLOps with In-Database Machine LearningPaige Roberts: Shortcut MLOps with In-Database Machine Learning
Paige Roberts: Shortcut MLOps with In-Database Machine Learning
 
Machine learning
Machine learningMachine learning
Machine learning
 
Aditya Bhattacharya - Enterprise DL - Accelerating Deep Learning Solutions to...
Aditya Bhattacharya - Enterprise DL - Accelerating Deep Learning Solutions to...Aditya Bhattacharya - Enterprise DL - Accelerating Deep Learning Solutions to...
Aditya Bhattacharya - Enterprise DL - Accelerating Deep Learning Solutions to...
 
Building a Scalable and reliable open source ML Platform with MLFlow
Building a Scalable and reliable open source ML Platform with MLFlowBuilding a Scalable and reliable open source ML Platform with MLFlow
Building a Scalable and reliable open source ML Platform with MLFlow
 

More from DataScienceConferenc1

[DSC Europe 23] Luciano Catani - AI in Diplomacy.PDF
[DSC Europe 23] Luciano Catani - AI in Diplomacy.PDF[DSC Europe 23] Luciano Catani - AI in Diplomacy.PDF
[DSC Europe 23] Luciano Catani - AI in Diplomacy.PDFDataScienceConferenc1
 
[DSC Europe 23] Rania Wazir - Mathematician jokes, cute cat photos, offensiv...
[DSC Europe 23] Rania Wazir -  Mathematician jokes, cute cat photos, offensiv...[DSC Europe 23] Rania Wazir -  Mathematician jokes, cute cat photos, offensiv...
[DSC Europe 23] Rania Wazir - Mathematician jokes, cute cat photos, offensiv...DataScienceConferenc1
 
[DSC Europe 23] Irena Cerovic - AI in International Development.pdf
[DSC Europe 23] Irena Cerovic - AI in International Development.pdf[DSC Europe 23] Irena Cerovic - AI in International Development.pdf
[DSC Europe 23] Irena Cerovic - AI in International Development.pdfDataScienceConferenc1
 
[DSC Europe 23] Ilija Duni - How Foursquare Builds Meaningful Bridges Between...
[DSC Europe 23] Ilija Duni - How Foursquare Builds Meaningful Bridges Between...[DSC Europe 23] Ilija Duni - How Foursquare Builds Meaningful Bridges Between...
[DSC Europe 23] Ilija Duni - How Foursquare Builds Meaningful Bridges Between...DataScienceConferenc1
 
[DSC Europe 23] Branka Panic - Peace in the age of artificial intelligence.pptx
[DSC Europe 23] Branka Panic - Peace in the age of artificial intelligence.pptx[DSC Europe 23] Branka Panic - Peace in the age of artificial intelligence.pptx
[DSC Europe 23] Branka Panic - Peace in the age of artificial intelligence.pptxDataScienceConferenc1
 
[DSC Europe 23][DigiHealth] Goran Dumic - Data-Driven Approach In Treatments
[DSC Europe 23][DigiHealth]  Goran Dumic -  Data-Driven Approach In Treatments[DSC Europe 23][DigiHealth]  Goran Dumic -  Data-Driven Approach In Treatments
[DSC Europe 23][DigiHealth] Goran Dumic - Data-Driven Approach In TreatmentsDataScienceConferenc1
 
[DSC Europe 23][DigiHealth] Milos Todorovic - Bridging the Gap-Innovating Ag...
[DSC Europe 23][DigiHealth]  Milos Todorovic - Bridging the Gap-Innovating Ag...[DSC Europe 23][DigiHealth]  Milos Todorovic - Bridging the Gap-Innovating Ag...
[DSC Europe 23][DigiHealth] Milos Todorovic - Bridging the Gap-Innovating Ag...DataScienceConferenc1
 
[DSC Europe 23][DigiHealth] Urosh VIlimanovich Clinical Data Management and C...
[DSC Europe 23][DigiHealth] Urosh VIlimanovich Clinical Data Management and C...[DSC Europe 23][DigiHealth] Urosh VIlimanovich Clinical Data Management and C...
[DSC Europe 23][DigiHealth] Urosh VIlimanovich Clinical Data Management and C...DataScienceConferenc1
 
[DSC Europe 23][DigiHealth] Vladimir Brusic - SMART HEALTH HOME: Technology,...
[DSC Europe 23][DigiHealth]  Vladimir Brusic - SMART HEALTH HOME: Technology,...[DSC Europe 23][DigiHealth]  Vladimir Brusic - SMART HEALTH HOME: Technology,...
[DSC Europe 23][DigiHealth] Vladimir Brusic - SMART HEALTH HOME: Technology,...DataScienceConferenc1
 
[DSC Europe 23][DigiHealth] Dimitar Penkov Grid Search Optimization of Novel...
[DSC Europe 23][DigiHealth]  Dimitar Penkov Grid Search Optimization of Novel...[DSC Europe 23][DigiHealth]  Dimitar Penkov Grid Search Optimization of Novel...
[DSC Europe 23][DigiHealth] Dimitar Penkov Grid Search Optimization of Novel...DataScienceConferenc1
 
[DSC Europe 23][DigiHealth] Tomislav Krizan - AIMED
[DSC Europe 23][DigiHealth] Tomislav Krizan - AIMED[DSC Europe 23][DigiHealth] Tomislav Krizan - AIMED
[DSC Europe 23][DigiHealth] Tomislav Krizan - AIMEDDataScienceConferenc1
 
[DSC Europe 23][DigiHealth] Katarina Vucicevic - Navigating theKinetics of Dr...
[DSC Europe 23][DigiHealth] Katarina Vucicevic - Navigating theKinetics of Dr...[DSC Europe 23][DigiHealth] Katarina Vucicevic - Navigating theKinetics of Dr...
[DSC Europe 23][DigiHealth] Katarina Vucicevic - Navigating theKinetics of Dr...DataScienceConferenc1
 
[DSC Europe 23][DigiHealth] Anja Baresic 0- Croatian digital Healthcare ecosy...
[DSC Europe 23][DigiHealth] Anja Baresic 0- Croatian digital Healthcare ecosy...[DSC Europe 23][DigiHealth] Anja Baresic 0- Croatian digital Healthcare ecosy...
[DSC Europe 23][DigiHealth] Anja Baresic 0- Croatian digital Healthcare ecosy...DataScienceConferenc1
 
[DSC Europe 23][AI:CSI] Dragan Pleskonjic - AI Impact on Cybersecurity and P...
[DSC Europe 23][AI:CSI]  Dragan Pleskonjic - AI Impact on Cybersecurity and P...[DSC Europe 23][AI:CSI]  Dragan Pleskonjic - AI Impact on Cybersecurity and P...
[DSC Europe 23][AI:CSI] Dragan Pleskonjic - AI Impact on Cybersecurity and P...DataScienceConferenc1
 
[DSC Europe 23][AI:CSI] Uros Arsenijevic Unlocking Cybersecurity with Seif
[DSC Europe 23][AI:CSI] Uros Arsenijevic Unlocking Cybersecurity with Seif[DSC Europe 23][AI:CSI] Uros Arsenijevic Unlocking Cybersecurity with Seif
[DSC Europe 23][AI:CSI] Uros Arsenijevic Unlocking Cybersecurity with SeifDataScienceConferenc1
 
[DSC Europe 23][AI:CSI] Goran Gvozden Improving Cybersecurity Posture with an...
[DSC Europe 23][AI:CSI] Goran Gvozden Improving Cybersecurity Posture with an...[DSC Europe 23][AI:CSI] Goran Gvozden Improving Cybersecurity Posture with an...
[DSC Europe 23][AI:CSI] Goran Gvozden Improving Cybersecurity Posture with an...DataScienceConferenc1
 
[DSC Europe 23][AI:CSI] Aleksa Stojanovic - Applying AI for Threat Detection ...
[DSC Europe 23][AI:CSI] Aleksa Stojanovic - Applying AI for Threat Detection ...[DSC Europe 23][AI:CSI] Aleksa Stojanovic - Applying AI for Threat Detection ...
[DSC Europe 23][AI:CSI] Aleksa Stojanovic - Applying AI for Threat Detection ...DataScienceConferenc1
 
[DSC Europe 23][DigiHealth] Muthu Ramachandran AI and Blockchain Framework fo...
[DSC Europe 23][DigiHealth] Muthu Ramachandran AI and Blockchain Framework fo...[DSC Europe 23][DigiHealth] Muthu Ramachandran AI and Blockchain Framework fo...
[DSC Europe 23][DigiHealth] Muthu Ramachandran AI and Blockchain Framework fo...DataScienceConferenc1
 
[DSC Europe 23][DigiHealth] Ligia Kornowska-How_may AI help you
[DSC Europe 23][DigiHealth] Ligia Kornowska-How_may AI help you[DSC Europe 23][DigiHealth] Ligia Kornowska-How_may AI help you
[DSC Europe 23][DigiHealth] Ligia Kornowska-How_may AI help youDataScienceConferenc1
 
[DSC Europe 23][DigiHealth] Ilya Zakharov - NETWORK NEUROSCIENCE WHERE THE BR...
[DSC Europe 23][DigiHealth] Ilya Zakharov - NETWORK NEUROSCIENCE WHERE THE BR...[DSC Europe 23][DigiHealth] Ilya Zakharov - NETWORK NEUROSCIENCE WHERE THE BR...
[DSC Europe 23][DigiHealth] Ilya Zakharov - NETWORK NEUROSCIENCE WHERE THE BR...DataScienceConferenc1
 

More from DataScienceConferenc1 (20)

[DSC Europe 23] Luciano Catani - AI in Diplomacy.PDF
[DSC Europe 23] Luciano Catani - AI in Diplomacy.PDF[DSC Europe 23] Luciano Catani - AI in Diplomacy.PDF
[DSC Europe 23] Luciano Catani - AI in Diplomacy.PDF
 
[DSC Europe 23] Rania Wazir - Mathematician jokes, cute cat photos, offensiv...
[DSC Europe 23] Rania Wazir -  Mathematician jokes, cute cat photos, offensiv...[DSC Europe 23] Rania Wazir -  Mathematician jokes, cute cat photos, offensiv...
[DSC Europe 23] Rania Wazir - Mathematician jokes, cute cat photos, offensiv...
 
[DSC Europe 23] Irena Cerovic - AI in International Development.pdf
[DSC Europe 23] Irena Cerovic - AI in International Development.pdf[DSC Europe 23] Irena Cerovic - AI in International Development.pdf
[DSC Europe 23] Irena Cerovic - AI in International Development.pdf
 
[DSC Europe 23] Ilija Duni - How Foursquare Builds Meaningful Bridges Between...
[DSC Europe 23] Ilija Duni - How Foursquare Builds Meaningful Bridges Between...[DSC Europe 23] Ilija Duni - How Foursquare Builds Meaningful Bridges Between...
[DSC Europe 23] Ilija Duni - How Foursquare Builds Meaningful Bridges Between...
 
[DSC Europe 23] Branka Panic - Peace in the age of artificial intelligence.pptx
[DSC Europe 23] Branka Panic - Peace in the age of artificial intelligence.pptx[DSC Europe 23] Branka Panic - Peace in the age of artificial intelligence.pptx
[DSC Europe 23] Branka Panic - Peace in the age of artificial intelligence.pptx
 
[DSC Europe 23][DigiHealth] Goran Dumic - Data-Driven Approach In Treatments
[DSC Europe 23][DigiHealth]  Goran Dumic -  Data-Driven Approach In Treatments[DSC Europe 23][DigiHealth]  Goran Dumic -  Data-Driven Approach In Treatments
[DSC Europe 23][DigiHealth] Goran Dumic - Data-Driven Approach In Treatments
 
[DSC Europe 23][DigiHealth] Milos Todorovic - Bridging the Gap-Innovating Ag...
[DSC Europe 23][DigiHealth]  Milos Todorovic - Bridging the Gap-Innovating Ag...[DSC Europe 23][DigiHealth]  Milos Todorovic - Bridging the Gap-Innovating Ag...
[DSC Europe 23][DigiHealth] Milos Todorovic - Bridging the Gap-Innovating Ag...
 
[DSC Europe 23][DigiHealth] Urosh VIlimanovich Clinical Data Management and C...
[DSC Europe 23][DigiHealth] Urosh VIlimanovich Clinical Data Management and C...[DSC Europe 23][DigiHealth] Urosh VIlimanovich Clinical Data Management and C...
[DSC Europe 23][DigiHealth] Urosh VIlimanovich Clinical Data Management and C...
 
[DSC Europe 23][DigiHealth] Vladimir Brusic - SMART HEALTH HOME: Technology,...
[DSC Europe 23][DigiHealth]  Vladimir Brusic - SMART HEALTH HOME: Technology,...[DSC Europe 23][DigiHealth]  Vladimir Brusic - SMART HEALTH HOME: Technology,...
[DSC Europe 23][DigiHealth] Vladimir Brusic - SMART HEALTH HOME: Technology,...
 
[DSC Europe 23][DigiHealth] Dimitar Penkov Grid Search Optimization of Novel...
[DSC Europe 23][DigiHealth]  Dimitar Penkov Grid Search Optimization of Novel...[DSC Europe 23][DigiHealth]  Dimitar Penkov Grid Search Optimization of Novel...
[DSC Europe 23][DigiHealth] Dimitar Penkov Grid Search Optimization of Novel...
 
[DSC Europe 23][DigiHealth] Tomislav Krizan - AIMED
[DSC Europe 23][DigiHealth] Tomislav Krizan - AIMED[DSC Europe 23][DigiHealth] Tomislav Krizan - AIMED
[DSC Europe 23][DigiHealth] Tomislav Krizan - AIMED
 
[DSC Europe 23][DigiHealth] Katarina Vucicevic - Navigating theKinetics of Dr...
[DSC Europe 23][DigiHealth] Katarina Vucicevic - Navigating theKinetics of Dr...[DSC Europe 23][DigiHealth] Katarina Vucicevic - Navigating theKinetics of Dr...
[DSC Europe 23][DigiHealth] Katarina Vucicevic - Navigating theKinetics of Dr...
 
[DSC Europe 23][DigiHealth] Anja Baresic 0- Croatian digital Healthcare ecosy...
[DSC Europe 23][DigiHealth] Anja Baresic 0- Croatian digital Healthcare ecosy...[DSC Europe 23][DigiHealth] Anja Baresic 0- Croatian digital Healthcare ecosy...
[DSC Europe 23][DigiHealth] Anja Baresic 0- Croatian digital Healthcare ecosy...
 
[DSC Europe 23][AI:CSI] Dragan Pleskonjic - AI Impact on Cybersecurity and P...
[DSC Europe 23][AI:CSI]  Dragan Pleskonjic - AI Impact on Cybersecurity and P...[DSC Europe 23][AI:CSI]  Dragan Pleskonjic - AI Impact on Cybersecurity and P...
[DSC Europe 23][AI:CSI] Dragan Pleskonjic - AI Impact on Cybersecurity and P...
 
[DSC Europe 23][AI:CSI] Uros Arsenijevic Unlocking Cybersecurity with Seif
[DSC Europe 23][AI:CSI] Uros Arsenijevic Unlocking Cybersecurity with Seif[DSC Europe 23][AI:CSI] Uros Arsenijevic Unlocking Cybersecurity with Seif
[DSC Europe 23][AI:CSI] Uros Arsenijevic Unlocking Cybersecurity with Seif
 
[DSC Europe 23][AI:CSI] Goran Gvozden Improving Cybersecurity Posture with an...
[DSC Europe 23][AI:CSI] Goran Gvozden Improving Cybersecurity Posture with an...[DSC Europe 23][AI:CSI] Goran Gvozden Improving Cybersecurity Posture with an...
[DSC Europe 23][AI:CSI] Goran Gvozden Improving Cybersecurity Posture with an...
 
[DSC Europe 23][AI:CSI] Aleksa Stojanovic - Applying AI for Threat Detection ...
[DSC Europe 23][AI:CSI] Aleksa Stojanovic - Applying AI for Threat Detection ...[DSC Europe 23][AI:CSI] Aleksa Stojanovic - Applying AI for Threat Detection ...
[DSC Europe 23][AI:CSI] Aleksa Stojanovic - Applying AI for Threat Detection ...
 
[DSC Europe 23][DigiHealth] Muthu Ramachandran AI and Blockchain Framework fo...
[DSC Europe 23][DigiHealth] Muthu Ramachandran AI and Blockchain Framework fo...[DSC Europe 23][DigiHealth] Muthu Ramachandran AI and Blockchain Framework fo...
[DSC Europe 23][DigiHealth] Muthu Ramachandran AI and Blockchain Framework fo...
 
[DSC Europe 23][DigiHealth] Ligia Kornowska-How_may AI help you
[DSC Europe 23][DigiHealth] Ligia Kornowska-How_may AI help you[DSC Europe 23][DigiHealth] Ligia Kornowska-How_may AI help you
[DSC Europe 23][DigiHealth] Ligia Kornowska-How_may AI help you
 
[DSC Europe 23][DigiHealth] Ilya Zakharov - NETWORK NEUROSCIENCE WHERE THE BR...
[DSC Europe 23][DigiHealth] Ilya Zakharov - NETWORK NEUROSCIENCE WHERE THE BR...[DSC Europe 23][DigiHealth] Ilya Zakharov - NETWORK NEUROSCIENCE WHERE THE BR...
[DSC Europe 23][DigiHealth] Ilya Zakharov - NETWORK NEUROSCIENCE WHERE THE BR...
 

Recently uploaded

Top 5 Best Data Analytics Courses In Queens
Top 5 Best Data Analytics Courses In QueensTop 5 Best Data Analytics Courses In Queens
Top 5 Best Data Analytics Courses In Queensdataanalyticsqueen03
 
Advanced Machine Learning for Business Professionals
Advanced Machine Learning for Business ProfessionalsAdvanced Machine Learning for Business Professionals
Advanced Machine Learning for Business ProfessionalsVICTOR MAESTRE RAMIREZ
 
High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...
High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...
High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...soniya singh
 
Call Girls In Dwarka 9654467111 Escorts Service
Call Girls In Dwarka 9654467111 Escorts ServiceCall Girls In Dwarka 9654467111 Escorts Service
Call Girls In Dwarka 9654467111 Escorts ServiceSapana Sha
 
PKS-TGC-1084-630 - Stage 1 Proposal.pptx
PKS-TGC-1084-630 - Stage 1 Proposal.pptxPKS-TGC-1084-630 - Stage 1 Proposal.pptx
PKS-TGC-1084-630 - Stage 1 Proposal.pptxPramod Kumar Srivastava
 
毕业文凭制作#回国入职#diploma#degree澳洲中央昆士兰大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degree
毕业文凭制作#回国入职#diploma#degree澳洲中央昆士兰大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degree毕业文凭制作#回国入职#diploma#degree澳洲中央昆士兰大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degree
毕业文凭制作#回国入职#diploma#degree澳洲中央昆士兰大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degreeyuu sss
 
How we prevented account sharing with MFA
How we prevented account sharing with MFAHow we prevented account sharing with MFA
How we prevented account sharing with MFAAndrei Kaleshka
 
RS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝Delhi
RS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝DelhiRS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝Delhi
RS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝Delhijennyeacort
 
Effects of Smartphone Addiction on the Academic Performances of Grades 9 to 1...
Effects of Smartphone Addiction on the Academic Performances of Grades 9 to 1...Effects of Smartphone Addiction on the Academic Performances of Grades 9 to 1...
Effects of Smartphone Addiction on the Academic Performances of Grades 9 to 1...limedy534
 
Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...
Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...
Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...dajasot375
 
Multiple time frame trading analysis -brianshannon.pdf
Multiple time frame trading analysis -brianshannon.pdfMultiple time frame trading analysis -brianshannon.pdf
Multiple time frame trading analysis -brianshannon.pdfchwongval
 
2006_GasProcessing_HB (1).pdf HYDROCARBON PROCESSING
2006_GasProcessing_HB (1).pdf HYDROCARBON PROCESSING2006_GasProcessing_HB (1).pdf HYDROCARBON PROCESSING
2006_GasProcessing_HB (1).pdf HYDROCARBON PROCESSINGmarianagonzalez07
 
Defining Constituents, Data Vizzes and Telling a Data Story
Defining Constituents, Data Vizzes and Telling a Data StoryDefining Constituents, Data Vizzes and Telling a Data Story
Defining Constituents, Data Vizzes and Telling a Data StoryJeremy Anderson
 
9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service
9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service
9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort servicejennyeacort
 
办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一
办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一
办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一F sss
 
RABBIT: A CLI tool for identifying bots based on their GitHub events.
RABBIT: A CLI tool for identifying bots based on their GitHub events.RABBIT: A CLI tool for identifying bots based on their GitHub events.
RABBIT: A CLI tool for identifying bots based on their GitHub events.natarajan8993
 
GA4 Without Cookies [Measure Camp AMS]
GA4 Without Cookies [Measure Camp AMS]GA4 Without Cookies [Measure Camp AMS]
GA4 Without Cookies [Measure Camp AMS]📊 Markus Baersch
 
INTERNSHIP ON PURBASHA COMPOSITE TEX LTD
INTERNSHIP ON PURBASHA COMPOSITE TEX LTDINTERNSHIP ON PURBASHA COMPOSITE TEX LTD
INTERNSHIP ON PURBASHA COMPOSITE TEX LTDRafezzaman
 
IMA MSN - Medical Students Network (2).pptx
IMA MSN - Medical Students Network (2).pptxIMA MSN - Medical Students Network (2).pptx
IMA MSN - Medical Students Network (2).pptxdolaknnilon
 
EMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM TRACKING WITH GOOGLE ANALYTICS.pptx
EMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM  TRACKING WITH GOOGLE ANALYTICS.pptxEMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM  TRACKING WITH GOOGLE ANALYTICS.pptx
EMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM TRACKING WITH GOOGLE ANALYTICS.pptxthyngster
 

Recently uploaded (20)

Top 5 Best Data Analytics Courses In Queens
Top 5 Best Data Analytics Courses In QueensTop 5 Best Data Analytics Courses In Queens
Top 5 Best Data Analytics Courses In Queens
 
Advanced Machine Learning for Business Professionals
Advanced Machine Learning for Business ProfessionalsAdvanced Machine Learning for Business Professionals
Advanced Machine Learning for Business Professionals
 
High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...
High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...
High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...
 
Call Girls In Dwarka 9654467111 Escorts Service
Call Girls In Dwarka 9654467111 Escorts ServiceCall Girls In Dwarka 9654467111 Escorts Service
Call Girls In Dwarka 9654467111 Escorts Service
 
PKS-TGC-1084-630 - Stage 1 Proposal.pptx
PKS-TGC-1084-630 - Stage 1 Proposal.pptxPKS-TGC-1084-630 - Stage 1 Proposal.pptx
PKS-TGC-1084-630 - Stage 1 Proposal.pptx
 
毕业文凭制作#回国入职#diploma#degree澳洲中央昆士兰大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degree
毕业文凭制作#回国入职#diploma#degree澳洲中央昆士兰大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degree毕业文凭制作#回国入职#diploma#degree澳洲中央昆士兰大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degree
毕业文凭制作#回国入职#diploma#degree澳洲中央昆士兰大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degree
 
How we prevented account sharing with MFA
How we prevented account sharing with MFAHow we prevented account sharing with MFA
How we prevented account sharing with MFA
 
RS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝Delhi
RS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝DelhiRS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝Delhi
RS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝Delhi
 
Effects of Smartphone Addiction on the Academic Performances of Grades 9 to 1...
Effects of Smartphone Addiction on the Academic Performances of Grades 9 to 1...Effects of Smartphone Addiction on the Academic Performances of Grades 9 to 1...
Effects of Smartphone Addiction on the Academic Performances of Grades 9 to 1...
 
Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...
Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...
Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...
 
Multiple time frame trading analysis -brianshannon.pdf
Multiple time frame trading analysis -brianshannon.pdfMultiple time frame trading analysis -brianshannon.pdf
Multiple time frame trading analysis -brianshannon.pdf
 
2006_GasProcessing_HB (1).pdf HYDROCARBON PROCESSING
2006_GasProcessing_HB (1).pdf HYDROCARBON PROCESSING2006_GasProcessing_HB (1).pdf HYDROCARBON PROCESSING
2006_GasProcessing_HB (1).pdf HYDROCARBON PROCESSING
 
Defining Constituents, Data Vizzes and Telling a Data Story
Defining Constituents, Data Vizzes and Telling a Data StoryDefining Constituents, Data Vizzes and Telling a Data Story
Defining Constituents, Data Vizzes and Telling a Data Story
 
9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service
9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service
9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service
 
办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一
办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一
办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一
 
RABBIT: A CLI tool for identifying bots based on their GitHub events.
RABBIT: A CLI tool for identifying bots based on their GitHub events.RABBIT: A CLI tool for identifying bots based on their GitHub events.
RABBIT: A CLI tool for identifying bots based on their GitHub events.
 
GA4 Without Cookies [Measure Camp AMS]
GA4 Without Cookies [Measure Camp AMS]GA4 Without Cookies [Measure Camp AMS]
GA4 Without Cookies [Measure Camp AMS]
 
INTERNSHIP ON PURBASHA COMPOSITE TEX LTD
INTERNSHIP ON PURBASHA COMPOSITE TEX LTDINTERNSHIP ON PURBASHA COMPOSITE TEX LTD
INTERNSHIP ON PURBASHA COMPOSITE TEX LTD
 
IMA MSN - Medical Students Network (2).pptx
IMA MSN - Medical Students Network (2).pptxIMA MSN - Medical Students Network (2).pptx
IMA MSN - Medical Students Network (2).pptx
 
EMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM TRACKING WITH GOOGLE ANALYTICS.pptx
EMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM  TRACKING WITH GOOGLE ANALYTICS.pptxEMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM  TRACKING WITH GOOGLE ANALYTICS.pptx
EMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM TRACKING WITH GOOGLE ANALYTICS.pptx
 

[DSC Europe 23] Petar Zecevic - ML in Production on Databricks

  • 1. inteligencija.com Machine Learning in Production on Databricks Petar Zečević Senior Principal Consultant Poslovna inteligencija, Zagreb, Croatia
  • 2. inteligencija.com Agenda • Why is productionising Machine Learning hard? • Overview of the Machine Learning lifecycle best practices • Overview of how Databricks solves Machine Learning productionisation
  • 4. inteligencija.com ML lifecycle – the naive version Streaming data? Is the data fresh? Schema changes? ETL code versioning? Has the data distribution changed? ETL testing? What parameters and algos worked? Is the model performanc e still OK? Which environment ? Can the environment be reproduced? Which data was used for training? Preparation code versioning and testing? Are all the features really needed? Are features in the prod equivalent to the ones from training? Is the whole pipeline integration tested?
  • 5. inteligencija.com How is ML development different? „The ML Test Score: A Rubric for ML Production Readiness and Technical Debt Reduction”, Breck et al., Google 2017
  • 6. inteligencija.com Getting to production – the ML development process „Hidden Technical Debt in Machine Learning Systems”, Sculley et al., Google 2014
  • 7. inteligencija.com What is MLOps? MLOps = DevOps + DataOps + ModelOp
  • 8. inteligencija.com What is DevOps? Image source: Wikipedia • Application code versioning • Continuous integration – CI • Continuous deployment – CD • Automated testing • Infrastructure as code • Configuration management • Monitoring
  • 9. inteligencija.com What is DataOps? Image source: Monte Carlo Data • ETL/ELT pipelines • Code versioning • Data lineage • Data testing • Data privacy • Data self service • Feature engineering
  • 10. inteligencija.com What is ModelOps? Image source: Aksel Yap • Feature engineering • Tracking experiments • Model validation and testing • Versioning of model code • Apply models to real-life data (deployment) • Model performance monitoring
  • 11. inteligencija.com Deployment paradigms • Batch – most of the applications • Streaming – latency in seconds and minutes • Real time – latency in <1s • Edge (on-device) – specially tuned models
  • 12. inteligencija.com Getting ML to production: MLOps best practices
  • 13. inteligencija.com Data best practices • Version control data pipeline code • Document feature expectations and automate data quality checks • Design for reusable and modular data pipelines • Test feature creation and data processing code • Use a Feature Store to ensure that features are consistent across different models and pipelines • Adopt CI/CD for data pipelines • Adopt Infrastructure as Code • Beware sensitive data and compliance • Training/serving skew – Check that training and serving features are computed in the same way (a.k.a online/offline skew)
  • 14. inteligencija.com Model best practices • Version control model training code and track experiments • Model testing: • Check for feature usefulness and cost • Tune all hyperparameters • Compare models to simpler alternatives – sanity check • Test performance on important subsets of data (e.g. regions) • Understand the real-world impact of the model outputs • Use canary deployments and A/B testing in production • Have a rollback strategy • Monitor for model degradation in production • Understand how fast the model goes stale • Set up automatic retraining pipelines (continuous learning)
  • 17. inteligencija.com Delta Lake • ACID transactions – ensures data consistency and reliability • Schema enforcement and evolution – helps with data quality • Time travel (Data versioning) – facilitates experimentation • Deletes and upserts (MERGE INTO) – iterative and incremental feature preparation • Data skipping and other optimizations – improves performance
  • 18. inteligencija.com Delta Live Tables • Framework for building data processing pipelines • You define transformations and DLT manages: • Orchestration • Cluster management • Monitoring • Data quality (Expectations) • Error handling • Can perform CDC with APPLY CHANGES INTO .. FROM ..
  • 20. inteligencija.com Unity Catalog • Centralized data discovery and access – quick search for and reuse of existing datasets • Centralized data governance and security – Fine-grained access control management from a central location • Data lineage – tables, columns, notebooks, workflows and dashboards provide automatically collected lineage information • Collaboration – Cross-workspace sharing enables teams to share datasets across projects without data duplication, promoting consistent use
  • 23. inteligencija.com Feature Store • Centralized Feature Management – discoverable and reusable • Any table in Unity Catalog can serve as a feature table (since DBR 13.2) • Lineage – upstream and downstream • Consistency Across Models – Features used for training models are also served in production • Simplified Serving – models include feature metadata • Should be used consistently (log_model) – so that you can keep track of feature usage • You can publish features to online stores (Amazon DynamoDB, Aurora or RDS MySQL) • for models served with Databricks Model Serving
  • 24. inteligencija.com MLflow Integrated within the Databricks platform (notebooks and workflows): • MLflow Tracking – Log and query experiments and runs in terms of code, data, config, and results • MLflow Projects – Package data science code in a reusable, reproducible form to share with other data scientists or transfer to production. • MLflow Models – Manage and deploy machine learning models from a variety of ML libraries to a variety of model serving and inference platforms. • MLflow Model Registry – A centralized model store, set of APIs, and UI, to collaboratively manage the lifecycle of a MLflow Model.
  • 26. inteligencija.com MLflow Model Registry central model registry vs. per-workspace model registry
  • 27. inteligencija.com AutoML • Generates ML classification, regression or forecasting code automatically, based on input table and the target field • Features from the Feature Store can be joined • Jupyter notebooks with code for splitting data, setting up libraries etc. • Provides a good starting point for experiments and/or models ready to be registered
  • 28. inteligencija.com CI/CD integration • Databricks Repos UI is used for checking out Git branches, merging and pushing changes • It provides a REST API that can be invoked by Git automation • In production you can: • directly reference notebooks in remote Git repos by tags or branches • set up read-only folders with checked-out repos and update them automatically using Git automation • MLflow Model Registry provides an API so that Git automation can automatically transition models between environments
  • 30. inteligencija.com Execution environments Different environments, such as dev, staging and prod can be separate • Multiple cloud accounts • Multiple Databricks workspaces – within a single cloud account • Databricks workspace access controls
  • 31. inteligencija.com Promoting code and models Use Git branches to separate code versions: • dev branch for development • specific feature branches for feature development • release branches for different versions Lifecycle of models might be independent of code
  • 32. inteligencija.com Promoting code and models Image source: Databricks Promoting models and code across environments:
  • 33. inteligencija.com The recommended approach for model promotion The workflow recommended by Databricks: • Dev environment: • Develop training and other code • Promote code • Staging environment: • Test training code on subset of data • Test other code • Promote code • Prod environment: • Train model on production data • Test model • Deploy model • Deploy code
  • 34. inteligencija.com Links & Resources • Big Book of MLOps – Databricks • The Comprehensive Guide to Feature Stores – Databricks • ML in Production – Databricks course • https://github.com/databricks/mlops-stacks • https://ml-ops.org/ • https://cloud.google.com/architecture/mlops-continuous- delivery-and-automation-pipelines-in-machine-learning
  • 36. inteligencija.com We are Data & Analytics consulting company committed to deliver great solutions and products that enables our clients to unlock hidden opportunities within data, become data-driven and make better business decisions Our goal is to enable data-driven business decisions Offices in UK, Sweden, Austria, Slovenia and Croatia 180+ employees 20 years in Data & Analytics 250+ projects 100+ clients on 5 continents

Editor's Notes

  1. Data extraction & preparation – data collection, cleaning, wrangling, aggregating, transforming, feature engineering, etc. Exploratory Data Analysis – gaining an understanding of the data, its statistical properties and its mapping to business use case Model Training – train models on training data Model Validation – validate models on validation data Deployment – deploy models to production Monitoring – keeping track of how well models behave in production
  2. It should be clear by now that managing ML development and deployment is more complicated than traditional SW development and hence we need a distinct methodology which is MLOps.
  3. The whole point of DevOps is to enable fast, flexible and reliable delivery of applications. Besides development, it also comprises configuration management and monitoring.
  4. Training/serving skew – for example, if you have optimized code running in production