MLOps is a set of engineering practices for configuring machine learning-enabled systems to finally get the cloud infrastructure under control. Have you ever wondered what’s idempotence and a declarative approach to infrastructure and why environment drift is a thing of the past? 🤯
2. Let me introduce myself
• ML, AI
• Cloud and Big Data
• Ops: DLOps | MLOps | DevOps
• Embedded : IoT and Robotics
• Graphics, Games and Computer Vision
• Sony
• Amazon
• Deloitte
• Nokia
• Etc.
Principal Software Engineer @ Stack Builders
https://www.stackbuilders.com/
CEO @ Chernov Consulting
https://chernov.io
18+ years in business
https://www.linkedin.com/in/antonchernov/
3. What’s machine learning?
• Machine Learning (ML) is a branch of Artificial
Intelligence (AI) and computer science which
focuses on the use of data and algorithms to
imitate the way that humans learn.
• It can drastically optimise a business problem.
• Bad news: It may be the only job left after
everything has been automated.
• Good news: It can’t do anything that it
hasn’t been teached by humans atm
And why is it important?
4. Kaizen is a Japanese term meaning "change for the
better" or "continuous improvement.”
• How to I make a change?
• How do I know it isn’t breaking things?
• How do I know it isn’t making things worse?
• Does somebody know I’ve made a change at all?
• Did somebody do this before?
• What?! I need to do this manually?
The incredible world of “operations”
MLOPS, DEVOPS and OPS in general
source: wikipedia
5. • Chaos (Ancient Greek: χάος, romanized: kháos) is
the mythological void state preceding
the creation of the universe (the cosmos)
in Greek creation myths.
Experimentation
Incredible world of “operations”
source: wikipedia
6. • Unit testing
• Integration testing
• Canary deployments
• Incremental rollouts
• A/B testing
• Performance testing
• Model testing
• Observability
• Bias
• Concept drift
Going to production
Incredible world of “operations”
source: wikipedia
Alpha
Beta
Staging
Region A
Prod
Region N
…
Region A Region N
…
…
We are here
7. And what’s so special to manage it?
Why code is great
Source: https://git-scm.com
• Code is a document
• It’s formatted, linted, versioned, reviewed and
tested
• It is in your favourite (or not so) language
• You know exactly: “who did this”?!
• Declarative vs imperative
9. What is an application?
A attachment of theoretical thought to
real world problems.
• Where does a thought live?
In a head.
• Where does an application live?
In the infrastructure.
• What possibly can go wrong?
The real world.
source: wikipedia
10. Why do we need a cloud provider?
• Outsource the “real world”
• API
• Scalability
• Availability
• Security
• Monitoring
source: wikipedia
11. Why Kubernetes is the API to work with?
What’s our problem?
• Agile mindset
• We need to orchestrate containers
What can it do?
• Service discovery and load balancing
• Storage orchestration
• Automated rollouts and rollbacks
• Automatic bin packing
• Self-healing
• Secret and configuration management
• It is a huge dependency, but:
• Applications have a solid ground ”base”
(one to rule them all)
• Enabling helm (package manager)
• Cloud agnostic (you could move in
theory)
• Open source and widely adopted
12. • MLOps: Task and Workflow Orchestration
Tools on Kubernetes
• Kubeflow | MLflow | Metaflow | Flyte |
ZenML | Airflow | Argo | Tekton | Prefect |
Luigi
• And there are a lot more.
On Kubernetes
Workflow orchestration tools
Photo by Marek Piwnicki on Unsplash
13. • Let’s face it: there is much more AWS than
Kubernetes
• Authentication: IAM, RBAC etc.
• Storage: EBS, EFS, S3 etc.
• Networking: VPC, subnets etc.
• Observability: CloudTrail etc.
Kubernetes on AWS
EKS
14. • AWS CloudFormation templates are formatted text files in JSON or YAML
• Divided into Stacks
• Parameters are strings/int/list
With CloudFormation
EKS
15. • HCL language
• Divided into reusable modules with a registry
• Lots of open source
With Terraform
EKS
16. • Human language: TypeScript, Python, Go, C#
• Imperative + declarative
With Pulumi
EKS