Introduction to
MLOps
Agus Kurniawan
What is
MLOps?
• MLOps (Machine Learning Operations)
is a set of practices that combines
Machine Learning, DevOps, and Data
Engineering to streamline the lifecycle
of ML models—from development to
deployment and monitoring.
Definition:
• Automate, govern, and scale ML in
production.
Key Idea:
Why MLOps
Matters
ML
Lifecycle vs
MLOps
Lifecycle
• Data → Training → Model → Deploy
(static)
Traditional ML Lifecycle:
• Data → Training → Evaluation →
Deployment → Monitoring →
Retraining → CI/CD → Governance
→ Repeat (continuous cycle)
MLOps Lifecycle:
Components of
MLOps
• Data engineering
• Feature engineering & store
• Experiment tracking
• Model versioning
• CI/CD for ML (CI/CD/CT)
• Model registry
• Deployment pipelines
• Monitoring (drift, data & model)
• Governance & documentation
Defining
MLOps
• MLOps is the operational backbone for
machine learning systems that
ensures:
• Repeatability
• Reproducibility
• Scalability
• Governance
• Automation across the ML lifecycle
• It brings DevOps concepts into ML, but
with unique ML challenges.
Unique
Challenges
in ML (vs
Software
Dev)
Changing data → needs retraining
Model drift
Data quality issues
Feature inconsistency between train vs prod
High compute cost
Complex dependencies (GPU, frameworks)
Long-running training
Hard to reproduce experiments
Operational
Challenges
Governance
Challenges
• ML requires governance for:
• Bias detection
• Explainability (XAI)
• Auditability
• Regulatory compliance
(GDPR, HIPAA, PDPA)
• Security of data & models
What Risks Exist in ML
Projects?
• Data privacy & leakage
• Bias & fairness issues
• Incorrect predictions → business damage
• Drift & model degradation
• Operational downtime
• High compute cost waste
• Unreliable experiments (no reproducibility)
MLOps
Mitigation
Strategy
• MLOps reduces risk using:
• Automated Data Validation
• Model Evaluation Gates
• Approval Workflows
• CI/CD/CT Pipelines
• Model Registry with Version
Control
• Observability (logging,
metrics, traces)
• Automated retraining
Model Monitoring to Reduce Risk
• Prediction drift
• Data drift
• Model performance drop
• Latency & resource usage
Monitoring includes:
Alerts → rollback → retrain → redeploy.
Governance &
Compliance
• MLOps enforces:
• Version history
• Audit trail
• Access control
• Explainability reports
• Reproducibility of experiments
• Risk documentation
Scaling ML
Beyond POCs • Most ML starts as small POCs but fails at scale
because:
• Manual processes
• Cannot handle large datasets
• No automated pipelines
• Limited compute resources
• Lack of standardization
• MLOps solves this.
What
Scaling
Means
• Scaling ML includes:
• Handling millions of
predictions
• Running parallel experiments
• Supporting multiple
teams/projects
• Orchestrating GPUs/TPUs
• Managing distributed training
• Deploying globally across
environments
Scaling
With
MLOps
Tools
• Modern tools for scaling ML:
• Platforms: Azure ML, Vertex AI,
SageMaker, Databricks
• Orchestration: Airflow, Prefect,
Kubeflow
• Model Serving: MLflow, Seldon,
BentoML, FastAPI
• Feature Store: Feast, Tecton
• CI/CD: GitHub Actions, GitLab
CI, Jenkins
• Monitoring: Evidently AI,
WhyLabs, Prometheus, Grafana
Scaling Architecture
(High-Level)
• Typical MLOps Architecture
• Data ingestion (Batch/Streaming)
• Feature store
• Model training pipelines
• Model registry
• Containerized deployment
(Kubernetes/Serverless)
• Monitoring + feedback loop
• Automated retraining
Example
MLOps Pipeline
Flow
1. Data ingestion
2. Data validation
3. Feature engineering
4. Train model
5. Log experiment
6. Evaluate model
7. Register model
8. Deploy to staging → prod
9. Monitor
10. Trigger retraining
Summary
• MLOps makes ML repeatable,
scalable, reliable
• It addresses technical risks and
governance needs
• Scaling ML requires automation,
tooling, standardization
• Successful ML requires continuous
monitoring and retraining

Introduction to MLOps (Machine Learning Operational)

  • 1.
  • 2.
    What is MLOps? • MLOps(Machine Learning Operations) is a set of practices that combines Machine Learning, DevOps, and Data Engineering to streamline the lifecycle of ML models—from development to deployment and monitoring. Definition: • Automate, govern, and scale ML in production. Key Idea:
  • 3.
  • 4.
    ML Lifecycle vs MLOps Lifecycle • Data→ Training → Model → Deploy (static) Traditional ML Lifecycle: • Data → Training → Evaluation → Deployment → Monitoring → Retraining → CI/CD → Governance → Repeat (continuous cycle) MLOps Lifecycle:
  • 6.
    Components of MLOps • Dataengineering • Feature engineering & store • Experiment tracking • Model versioning • CI/CD for ML (CI/CD/CT) • Model registry • Deployment pipelines • Monitoring (drift, data & model) • Governance & documentation
  • 7.
    Defining MLOps • MLOps isthe operational backbone for machine learning systems that ensures: • Repeatability • Reproducibility • Scalability • Governance • Automation across the ML lifecycle • It brings DevOps concepts into ML, but with unique ML challenges.
  • 8.
    Unique Challenges in ML (vs Software Dev) Changingdata → needs retraining Model drift Data quality issues Feature inconsistency between train vs prod High compute cost Complex dependencies (GPU, frameworks) Long-running training Hard to reproduce experiments
  • 9.
  • 10.
    Governance Challenges • ML requiresgovernance for: • Bias detection • Explainability (XAI) • Auditability • Regulatory compliance (GDPR, HIPAA, PDPA) • Security of data & models
  • 11.
    What Risks Existin ML Projects? • Data privacy & leakage • Bias & fairness issues • Incorrect predictions → business damage • Drift & model degradation • Operational downtime • High compute cost waste • Unreliable experiments (no reproducibility)
  • 12.
    MLOps Mitigation Strategy • MLOps reducesrisk using: • Automated Data Validation • Model Evaluation Gates • Approval Workflows • CI/CD/CT Pipelines • Model Registry with Version Control • Observability (logging, metrics, traces) • Automated retraining
  • 13.
    Model Monitoring toReduce Risk • Prediction drift • Data drift • Model performance drop • Latency & resource usage Monitoring includes: Alerts → rollback → retrain → redeploy.
  • 14.
    Governance & Compliance • MLOpsenforces: • Version history • Audit trail • Access control • Explainability reports • Reproducibility of experiments • Risk documentation
  • 15.
    Scaling ML Beyond POCs• Most ML starts as small POCs but fails at scale because: • Manual processes • Cannot handle large datasets • No automated pipelines • Limited compute resources • Lack of standardization • MLOps solves this.
  • 16.
    What Scaling Means • Scaling MLincludes: • Handling millions of predictions • Running parallel experiments • Supporting multiple teams/projects • Orchestrating GPUs/TPUs • Managing distributed training • Deploying globally across environments
  • 17.
    Scaling With MLOps Tools • Modern toolsfor scaling ML: • Platforms: Azure ML, Vertex AI, SageMaker, Databricks • Orchestration: Airflow, Prefect, Kubeflow • Model Serving: MLflow, Seldon, BentoML, FastAPI • Feature Store: Feast, Tecton • CI/CD: GitHub Actions, GitLab CI, Jenkins • Monitoring: Evidently AI, WhyLabs, Prometheus, Grafana
  • 18.
    Scaling Architecture (High-Level) • TypicalMLOps Architecture • Data ingestion (Batch/Streaming) • Feature store • Model training pipelines • Model registry • Containerized deployment (Kubernetes/Serverless) • Monitoring + feedback loop • Automated retraining
  • 19.
    Example MLOps Pipeline Flow 1. Dataingestion 2. Data validation 3. Feature engineering 4. Train model 5. Log experiment 6. Evaluate model 7. Register model 8. Deploy to staging → prod 9. Monitor 10. Trigger retraining
  • 20.
    Summary • MLOps makesML repeatable, scalable, reliable • It addresses technical risks and governance needs • Scaling ML requires automation, tooling, standardization • Successful ML requires continuous monitoring and retraining