MLOPS
Machine Learning Operations
Objective
● What is MLOps ?
● Why MLOps ?
● Machine Learning project lifecycle ? + Case Study
● ML Pipelines
● Modeling w.r.t Production
● Deployment
● Monitoring
● Career Opportunities
What is MLOps & Why MLOps
The process of taking Machine Learning to Production, and then maintaining and monitoring.
Case Study : Fraud Detection System
Assuming a organization knows how to build and deploy model but,
● Without MLOps : No Reproducibility, Can’t identify new trends, Our ML system may not detect
new trends, No Scalability
Machine Learning Project Lifecycle
● Scoping - Define Projects
● Data - EDA, Label and Organize data
● Modeling - Select and train a model, Performance and Error Analysis
● Deployment - Deploy in Production, Monitoring and Maintain System
Scoping (Optional)
● Brainstorm Business Problem
● Brainstorm AI solutions
● Assess the feasibility
● Determine Metrics (ML, Software, Business)
● Determine Budget
Extras
● Free from bias
● Positive Societal Value
Data
● Data Centric AI Development
● Brainstorm Source of data sources : Source Amount Cost Time
● Labeling Data : In-house , Outsourced, Crowd-sourced
● Ensure Consistent labeling (Examples)
Characteristics of Good Data
● Good Coverage of input x
● Consistency
● Distribution covers data and concept drift
● Sized appropriately
Modelling
● Literature Search
● Sanity check : Training Model with fewer data before training on large datasets
● Error Analysis : Tag your data with different attributes, For speech recognition test with clean
speech, car noise, people noise.
● Prioritize what to work on, example : room for improvement, how freq that category appears,
how easy to improve accuracy, how important it is to improve that category.
● Performance Auditing - check for accuracy, fairness/bias
Deployment
Key challenges : Statistical issues, Software engineering issues
Software Engineering Issues
● Real time or Batch
● Cloud or Edge/Browser
● Compute Resources (CPU/GPU/Memory)
● Latency, Throughput
● Security and Privacy
Deployment Strategies
● Shadow Deployment
● Canary Deployment
● Blue - Green Deployment
Statistical Issues
● Concept Drift (x->y)
When the definition of what is y given x changes. Example : Prices of house increases because
of inflation or changes in market
● Data Shift
When the distribution of x changes. Example : Size of the houses changes over time
Monitoring
● Monitoring dashboard : Software Metrics, Input Metrics , Output Metrics
Software : Neptune.ai, Arize AI
● Alerts when Model starts decaying
Case Study
ML Pipelines
Data Pipeline
● Collecting Data
● Labeling Data
● Validating Data
● Pre-Processing and Feature Engineering
● Data Augmentation
Machine Learning Modeling Pipeline
Strategies to reduce cost
● Learning Curve Extrapolation
● Weights Inheritance
● Curse of Dim
● Quantization
● Pruning
● Distributed Training
● Knowledge Distillation (Important + Interesting)
Deployment
● Model Serving
● Metrics : latency, cost, throughput
● Scaling Infrastructure - vertical and horizontal
● Why horizontal over vertical
● Containers (optional)
Monitoring
● ML Monitoring and System Monitoring
● ML -> Predictive performance, Changes in Serving data, Metrics used during training
● System -> System performance, System System, System Reliability
● Observability
● Model Decay and Detection
● What if drift is detected ?
Career Opportunity
Skills Needed :
● Machine Learning
● Deep Learning
● System
● Database
Job Roles :
● MLOp Engineer
● AI Platform developer

MLOps.pptx

  • 1.
  • 2.
    Objective ● What isMLOps ? ● Why MLOps ? ● Machine Learning project lifecycle ? + Case Study ● ML Pipelines ● Modeling w.r.t Production ● Deployment ● Monitoring ● Career Opportunities
  • 3.
    What is MLOps& Why MLOps The process of taking Machine Learning to Production, and then maintaining and monitoring. Case Study : Fraud Detection System Assuming a organization knows how to build and deploy model but, ● Without MLOps : No Reproducibility, Can’t identify new trends, Our ML system may not detect new trends, No Scalability
  • 4.
    Machine Learning ProjectLifecycle ● Scoping - Define Projects ● Data - EDA, Label and Organize data ● Modeling - Select and train a model, Performance and Error Analysis ● Deployment - Deploy in Production, Monitoring and Maintain System
  • 5.
    Scoping (Optional) ● BrainstormBusiness Problem ● Brainstorm AI solutions ● Assess the feasibility ● Determine Metrics (ML, Software, Business) ● Determine Budget Extras ● Free from bias ● Positive Societal Value
  • 6.
    Data ● Data CentricAI Development ● Brainstorm Source of data sources : Source Amount Cost Time ● Labeling Data : In-house , Outsourced, Crowd-sourced ● Ensure Consistent labeling (Examples) Characteristics of Good Data ● Good Coverage of input x ● Consistency ● Distribution covers data and concept drift ● Sized appropriately
  • 7.
    Modelling ● Literature Search ●Sanity check : Training Model with fewer data before training on large datasets ● Error Analysis : Tag your data with different attributes, For speech recognition test with clean speech, car noise, people noise. ● Prioritize what to work on, example : room for improvement, how freq that category appears, how easy to improve accuracy, how important it is to improve that category. ● Performance Auditing - check for accuracy, fairness/bias
  • 8.
    Deployment Key challenges :Statistical issues, Software engineering issues Software Engineering Issues ● Real time or Batch ● Cloud or Edge/Browser ● Compute Resources (CPU/GPU/Memory) ● Latency, Throughput ● Security and Privacy
  • 9.
    Deployment Strategies ● ShadowDeployment ● Canary Deployment ● Blue - Green Deployment
  • 10.
    Statistical Issues ● ConceptDrift (x->y) When the definition of what is y given x changes. Example : Prices of house increases because of inflation or changes in market ● Data Shift When the distribution of x changes. Example : Size of the houses changes over time
  • 11.
    Monitoring ● Monitoring dashboard: Software Metrics, Input Metrics , Output Metrics Software : Neptune.ai, Arize AI ● Alerts when Model starts decaying
  • 12.
  • 13.
  • 14.
    Data Pipeline ● CollectingData ● Labeling Data ● Validating Data ● Pre-Processing and Feature Engineering ● Data Augmentation
  • 15.
    Machine Learning ModelingPipeline Strategies to reduce cost ● Learning Curve Extrapolation ● Weights Inheritance ● Curse of Dim ● Quantization ● Pruning ● Distributed Training ● Knowledge Distillation (Important + Interesting)
  • 16.
    Deployment ● Model Serving ●Metrics : latency, cost, throughput ● Scaling Infrastructure - vertical and horizontal ● Why horizontal over vertical ● Containers (optional)
  • 17.
    Monitoring ● ML Monitoringand System Monitoring ● ML -> Predictive performance, Changes in Serving data, Metrics used during training ● System -> System performance, System System, System Reliability ● Observability ● Model Decay and Detection ● What if drift is detected ?
  • 18.
    Career Opportunity Skills Needed: ● Machine Learning ● Deep Learning ● System ● Database Job Roles : ● MLOp Engineer ● AI Platform developer