MLOps and Data Quality:
Deploying Reliable ML Models in
Production
Presented by:
Stepan Pushkarev, CTO @ Provectus
Rinat Gareev, ML Solutions Architect @ Provectus
Webinar Objectives
1. Explore best practices of building and deploying reliable Machine Learning
models
2. Review existing open source tools and reference architectures for
implementation of Data Quality components as part of your MLOps
pipelines
3. Get qualified for Provectus ML Infrastructure Acceleration Program – A
fully funded discovery workshop
Agenda
● Introduction and Why
● How: Common Practical Challenges and Solutions
○ Data Testing
○ Model Testing
● MLOps: Wiring Things Together
● Provectus ML Infrastructure Acceleration Program
Introductions
Stepan Pushkarev
Chief Technology
Officer, Provectus
Rinat Gareev
ML Solutions Architect,
Provectus
AI-First Consultancy & Solutions Provider
Сlients ranging from
fast-growing startups to
large enterprises
450 employees and
growing
Established in 2010
HQ in Palo Alto
Offices across the US,
Canada, and Europe
We are obsessed about leveraging cloud, data, and AI to reimagine the way
businesses operate, compete, and deliver customer value
Innovative Tech Vendors
Seeking for niche expertise to differentiate
and win the market
Midsize to Large Enterprises
Seeking to accelerate innovation, achieve
operational excellence
Our Clients
Why Quality Data Matters?
After Data Cleaning 0.91
TFIDF, PoS, Stop Words 0.695
Scikit Learn Default 0.69
Python Hyperopt 0.73
ACCURACY
Sigmod2016
Sanjay Krishnan (UC Berkeley)
And Jiannan Wang (Simon Fraser U.)
https://sigmod2016.org/sigmod_tutorial1.shtml
End-to-end deep learning image classification
models to detect child gaze, strabismus,
crescent, and dark iris/pupil population.
GoCheck Kids
Case Study
Before After Data QA
Precision 32% 40%
Recall 89% 91%
FPR 19% 17%
PR AUC 57% 76%
Machine Learning Lifecycle
Data Ingestion
Data Cleaning
Data Merging
Data Labeling
Feature Engineering
Versioned
Dataset
Model Training
Experimentation
Model Packaging
Model
Candidate
Regression Testing
Model Selection
Production
Deployment
Monitoring
Data Preparation ML Engineering Delivery & Operations
All Stages of ML Lifecycle Require QA
Data Ingestion
Data Cleaning
Data Merging
Data Labeling
Feature Engineering
Versioned
Dataset
Model Training
Experimentation
Model Packaging
Model
Candidate
Regression Testing
Model Selection
Production
Deployment
Monitoring
Data Preparation ML Engineering Delivery & Operations
Data
Tests
Code
Tests
Model
Tests
Data
Tests
Code
Tests
Model
Tests
Data
Tests
Code
Tests
Error Cascades
* from "Everyone wants to do the model work, not the data work": Data Cascades in High-Stakes AI”,
N. Sambasivan et al., SIGCHI, ACM (2021)
How: Practical Challenges and
Solutions
Common Challenge #1:
How to find & access the data I trust?
1. Data is scattered across multiple data sources and
technologies: RDMS, DWH, Data Lakes, Blobs
2. Data ownership is not clear
3. Data requirements and SLAs are not clear
4. Metadata is not discoverable
5. As a result, all investments into Data and ML are killed by
data access and discoverability issues
Solution: Migrate to Data Mesh
Data Mesh is in the convergence of
Distributed Domain-Driven Architecture, Self-
Serve Platform Design, and Product Thinking
with Data
● Brings data closer to Domain Context
● Introduces the concept of Data as a
Product and all appropriate data
contracts
● Sorts out data ownership issues
https://martinfowler.com/articles/data-monolith-to-mesh.html
Invest into Global Data Catalog
The solution to answer questions like:
● Does this data exist? Where is it?
● What is the source of truth of the data?
● Who and/or which team is the owner?
● Who are the users of the data?
● Are there existing assets I can reuse?
● Can I trust this data?
* There are no established leaders
* Commercial vendors are not listed
Common Challenge #2:
How to get started with QA for Data and ML?
1. What exactly to test?
2. Who should test (Traditional QA, Data Engs, ML Engs,
Analysts)?
3. What tools to use?
4. As a result, low productivity of ML Engineers having to deal
with data quality issues.
Data: What to Test
Default data quality checks:
● Duplicates
● Missing values
● Syntax errors
● Format errors
● Semantic errors
● Integrity
Advanced unsupervised methods:
● Distribution tests
● KS, Chi-squared tests
● Outlier detection with AutoML
● Auto Constraints suggestion
● Data Profiling for Complex
Dependencies
Default data quality checks:
● Duplicates
● Missing values
● Syntax errors
● Format errors
● Semantic errors
● Integrity checks
Data: What to Test
Unsupervised Constraints Generation
Use cases:
● existing data with poor
documentation or
schema
● rapidly evolving data
● rich structure
● starting from scratch
1. Compute data
profiles/summaries
2. Generate checks on:
● types
● completeness
● ranges
● uniqueness
● distributions
Extensible:
● e.g., conventions on
column naming
3. Evaluate on
holdout subset
4. Review and add to
test suites
● Deequ
● GreatExpectations
● Tensorflow Data Validation
● dbt
Data Testing: Available Tools
* Commercial vendors are not listed
Model Testing
Model Testing: Analyzing Input and
Output Datasets
Model Testing: Datasets Are Test
Suites with Test Cases
● Golden UAT datasets
● Security datasets
● Production traffic replay
● Regression datasets
● Datasets for bias
● Datasets for edge cases
Model Testing: Bias
Bias is considered to be a disproportionate inclination or prejudice for or against an idea or thing.
10+ Bias Types
● Selection Bias — The selection of data in such
a way that the sample is not representative of
the population
● The Framing Effect — Annotation questions
that are constructed with a particular slant
● Systematic Bias — Consistent and repeatable
error.
● Outlier Data, Missing Values, Filtering Data
● Bias / Variance Trade off
● Personal Perception Bias
Model Testing: Available Tools
Adversarial Testing & Model Robustness:
1. Cleverhans by Ian Goodfellow & Nicolas Papernot
2. Adversarial Robustness Toolbox (ART) by DARPA
Bias and Fairness
1. AWS SageMaker Clarify
2. AIF360 by IBM
3. Aequitas by University of Chicago
MLOps: Wiring Things
Together
The Core of MLOps Pipelines
Model Code
ML Pipeline Code
Infrastructure as a
Code
Versioned Dataset
Production Metrics &
Alerts
Model Artifacts
Prediction Service
ML Metrics
Automated Pipeline Execution
Pipeline Metadata
Alerts Reports
Feature Store
Orchestration: Idempotent Execution
Feedback Loop for Production Data
The Core of MLOps Pipelines
Model Code
ML Pipeline Code
Infrastructure as a
Code
Versioned Dataset
Production Metrics &
Alerts
Model Artifacts
Prediction Service
ML Metrics
Automated Pipeline Execution
Pipeline Metadata
Alerts Reports
Feature Store
Orchestration: Idempotent Execution
Feedback Loop for Production Data
Data Quality Checks
Expanding Validation Pipelines
Feature Store ML Model
Versioned Dataset
Batch Quality
Checkpoints
Dataset Rules
Validation
Dataset
Bias Checker
Statistical Assertions
Outlier Detector
Deployed Model
Model
Validation
Model
Test for Bias
Model
Security Test
Regression
Test
Business
Acceptance
Traffic
Replay
1. You cannot deploy ML models to production without a clear
Data QA Strategy in place.
2. As a leader, focus on organizing data teams around product
features, to make them fully responsible for Data as a Product.
3. Design Data QA components as an essential part of your MLOps
foundation.
Final Recommendations
125 University Avenue
Suite 295, Palo Alto
California, 94301
provectus.com
Questions, details?
We would be happy to answer!

MLOps and Data Quality: Deploying Reliable ML Models in Production

  • 1.
    MLOps and DataQuality: Deploying Reliable ML Models in Production Presented by: Stepan Pushkarev, CTO @ Provectus Rinat Gareev, ML Solutions Architect @ Provectus
  • 2.
    Webinar Objectives 1. Explorebest practices of building and deploying reliable Machine Learning models 2. Review existing open source tools and reference architectures for implementation of Data Quality components as part of your MLOps pipelines 3. Get qualified for Provectus ML Infrastructure Acceleration Program – A fully funded discovery workshop
  • 3.
    Agenda ● Introduction andWhy ● How: Common Practical Challenges and Solutions ○ Data Testing ○ Model Testing ● MLOps: Wiring Things Together ● Provectus ML Infrastructure Acceleration Program
  • 4.
    Introductions Stepan Pushkarev Chief Technology Officer,Provectus Rinat Gareev ML Solutions Architect, Provectus
  • 5.
    AI-First Consultancy &Solutions Provider Сlients ranging from fast-growing startups to large enterprises 450 employees and growing Established in 2010 HQ in Palo Alto Offices across the US, Canada, and Europe We are obsessed about leveraging cloud, data, and AI to reimagine the way businesses operate, compete, and deliver customer value
  • 6.
    Innovative Tech Vendors Seekingfor niche expertise to differentiate and win the market Midsize to Large Enterprises Seeking to accelerate innovation, achieve operational excellence Our Clients
  • 7.
    Why Quality DataMatters? After Data Cleaning 0.91 TFIDF, PoS, Stop Words 0.695 Scikit Learn Default 0.69 Python Hyperopt 0.73 ACCURACY Sigmod2016 Sanjay Krishnan (UC Berkeley) And Jiannan Wang (Simon Fraser U.) https://sigmod2016.org/sigmod_tutorial1.shtml
  • 8.
    End-to-end deep learningimage classification models to detect child gaze, strabismus, crescent, and dark iris/pupil population. GoCheck Kids Case Study Before After Data QA Precision 32% 40% Recall 89% 91% FPR 19% 17% PR AUC 57% 76%
  • 9.
    Machine Learning Lifecycle DataIngestion Data Cleaning Data Merging Data Labeling Feature Engineering Versioned Dataset Model Training Experimentation Model Packaging Model Candidate Regression Testing Model Selection Production Deployment Monitoring Data Preparation ML Engineering Delivery & Operations
  • 10.
    All Stages ofML Lifecycle Require QA Data Ingestion Data Cleaning Data Merging Data Labeling Feature Engineering Versioned Dataset Model Training Experimentation Model Packaging Model Candidate Regression Testing Model Selection Production Deployment Monitoring Data Preparation ML Engineering Delivery & Operations Data Tests Code Tests Model Tests Data Tests Code Tests Model Tests Data Tests Code Tests
  • 11.
    Error Cascades * from"Everyone wants to do the model work, not the data work": Data Cascades in High-Stakes AI”, N. Sambasivan et al., SIGCHI, ACM (2021)
  • 12.
  • 13.
    Common Challenge #1: Howto find & access the data I trust? 1. Data is scattered across multiple data sources and technologies: RDMS, DWH, Data Lakes, Blobs 2. Data ownership is not clear 3. Data requirements and SLAs are not clear 4. Metadata is not discoverable 5. As a result, all investments into Data and ML are killed by data access and discoverability issues
  • 14.
    Solution: Migrate toData Mesh Data Mesh is in the convergence of Distributed Domain-Driven Architecture, Self- Serve Platform Design, and Product Thinking with Data ● Brings data closer to Domain Context ● Introduces the concept of Data as a Product and all appropriate data contracts ● Sorts out data ownership issues https://martinfowler.com/articles/data-monolith-to-mesh.html
  • 15.
    Invest into GlobalData Catalog The solution to answer questions like: ● Does this data exist? Where is it? ● What is the source of truth of the data? ● Who and/or which team is the owner? ● Who are the users of the data? ● Are there existing assets I can reuse? ● Can I trust this data? * There are no established leaders * Commercial vendors are not listed
  • 16.
    Common Challenge #2: Howto get started with QA for Data and ML? 1. What exactly to test? 2. Who should test (Traditional QA, Data Engs, ML Engs, Analysts)? 3. What tools to use? 4. As a result, low productivity of ML Engineers having to deal with data quality issues.
  • 17.
    Data: What toTest Default data quality checks: ● Duplicates ● Missing values ● Syntax errors ● Format errors ● Semantic errors ● Integrity
  • 18.
    Advanced unsupervised methods: ●Distribution tests ● KS, Chi-squared tests ● Outlier detection with AutoML ● Auto Constraints suggestion ● Data Profiling for Complex Dependencies Default data quality checks: ● Duplicates ● Missing values ● Syntax errors ● Format errors ● Semantic errors ● Integrity checks Data: What to Test
  • 19.
    Unsupervised Constraints Generation Usecases: ● existing data with poor documentation or schema ● rapidly evolving data ● rich structure ● starting from scratch 1. Compute data profiles/summaries 2. Generate checks on: ● types ● completeness ● ranges ● uniqueness ● distributions Extensible: ● e.g., conventions on column naming 3. Evaluate on holdout subset 4. Review and add to test suites
  • 20.
    ● Deequ ● GreatExpectations ●Tensorflow Data Validation ● dbt Data Testing: Available Tools * Commercial vendors are not listed
  • 21.
  • 22.
    Model Testing: AnalyzingInput and Output Datasets
  • 23.
    Model Testing: DatasetsAre Test Suites with Test Cases ● Golden UAT datasets ● Security datasets ● Production traffic replay ● Regression datasets ● Datasets for bias ● Datasets for edge cases
  • 24.
    Model Testing: Bias Biasis considered to be a disproportionate inclination or prejudice for or against an idea or thing.
  • 25.
    10+ Bias Types ●Selection Bias — The selection of data in such a way that the sample is not representative of the population ● The Framing Effect — Annotation questions that are constructed with a particular slant ● Systematic Bias — Consistent and repeatable error. ● Outlier Data, Missing Values, Filtering Data ● Bias / Variance Trade off ● Personal Perception Bias
  • 26.
    Model Testing: AvailableTools Adversarial Testing & Model Robustness: 1. Cleverhans by Ian Goodfellow & Nicolas Papernot 2. Adversarial Robustness Toolbox (ART) by DARPA Bias and Fairness 1. AWS SageMaker Clarify 2. AIF360 by IBM 3. Aequitas by University of Chicago
  • 27.
  • 28.
    The Core ofMLOps Pipelines Model Code ML Pipeline Code Infrastructure as a Code Versioned Dataset Production Metrics & Alerts Model Artifacts Prediction Service ML Metrics Automated Pipeline Execution Pipeline Metadata Alerts Reports Feature Store Orchestration: Idempotent Execution Feedback Loop for Production Data
  • 29.
    The Core ofMLOps Pipelines Model Code ML Pipeline Code Infrastructure as a Code Versioned Dataset Production Metrics & Alerts Model Artifacts Prediction Service ML Metrics Automated Pipeline Execution Pipeline Metadata Alerts Reports Feature Store Orchestration: Idempotent Execution Feedback Loop for Production Data Data Quality Checks
  • 30.
    Expanding Validation Pipelines FeatureStore ML Model Versioned Dataset Batch Quality Checkpoints Dataset Rules Validation Dataset Bias Checker Statistical Assertions Outlier Detector Deployed Model Model Validation Model Test for Bias Model Security Test Regression Test Business Acceptance Traffic Replay
  • 31.
    1. You cannotdeploy ML models to production without a clear Data QA Strategy in place. 2. As a leader, focus on organizing data teams around product features, to make them fully responsible for Data as a Product. 3. Design Data QA components as an essential part of your MLOps foundation. Final Recommendations
  • 32.
    125 University Avenue Suite295, Palo Alto California, 94301 provectus.com Questions, details? We would be happy to answer!