Revamping
ML Pipelines
with MLOps
2
Presented by
Sameer Mahajan
Principal Architect
Sameer Mahajan has 25 years of experience in the
software industry. He has worked for companies
like Microsoft and Symantec across areas like
machine learning, storage, cloud, big data,
networking and analytics in the United States &
India.
Sameer holds 9 US patents and is an alumnus of IIT
Bombay and Georgia Tech. He not only conducts
hands-on workshops and seminars but also
participates in panel discussions in upcoming
technologies like machine learning and big data.
Sameer is one of the mentors for the Machine
Learning Foundations course at Coursera.
• Background
• ML Lifecycle
• Challenges with ML Productization
• Examples of end-to-end ML platforms
• MLOps Best Practices
• MLOps Methodologies
• Build, Retrain and Release Pipelines
• Mlflow and demo
Agenda
• Airflow demo
• Model Serving Pipeline
• Tensorflow Model Serving
• Tensorflow js demo
• TFX-based MLOps system on Google Cloud
• Azure MLOps
• Conclusion
• Q & A
• ML spend will reach $57.6 billion by 2021
• More and more ML systems are going into production
• Gartner 2019 Survey suggests that
i. 59% have AI deployed today
ii. The average number of deployed AI projects to increase to 35 by 2022
• Streamline ML lifecycle
• Machine Learning Operations
• Started traction in 2018
Background
ML Lifecycle
Process model – Option B
• Reality we are
trying to
model
• Source of data
Ingest data from
sensors, devices,
databases
• Cleanse and
transform data
• Signal processing
Visual analytics
to capture trends
indicative of
underlying
model processes
Use the models in
the real world
applications and
processes for
predictions, insights
etc.
Data Engineering
Data
Capturing
Data
Preparation
Data
Visualization
Machine
Learning
Train models that
reflect the real-
world phenomena
InferenceWorld
Challenges
• Dealing with data, models and code
• Deployment and automation
• Collaboration : data engineers, data
scientists, ML engineers, business analysts,
operations
• Continuous Integration (CI), Deployment
(CD), Training (CT)
• Reproducibility of results
• Transformations
• Hyperparameters
• Initializers
• Hardware
More Challenges
• Complex pipelines
1. Ensemble
2. Retraining
3. Transfer learning
4. Multiple prediction pipelines in
parallel (Canary)
• Self-updating ML pipelines
• Governance : tracing failed result back
to data or code
• Scalability
Examples of end-to-end ML platforms
1. Uber’s Michaelangelo
2. Facebook’s FBLearner
3. Google has TFX
4. Airbnb has BigHead
5. Databricks introduced mlflow which is now open source
6. Sagemaker
7. Azure
8. Datarobot
9. Polyaxon and KubeFlow
• Background
• ML Lifecycle
• Challenges with ML Productization
• Examples of end-to-end ML platforms
• MLOps Best Practices
• MLOps Methodologies
• Build, Retrain and Release Pipelines
• Mlflow and demo
Agenda
• Airflow demo
• Model Serving Pipeline
• Tensorflow Model Serving
• Tensorflow js demo
• TFX-based MLOps system on Google Cloud
• Azure MLOps
• Conclusion
• Q & A
Best Practices
• Data Pipeline: Discoverable and Accessible Data - data lake, data mesh
• Versioned control: github, Data Science Version Control (DVC), mlflow Projects
• Data Exploration: Jupyter,
pandas, numpy, seaborn
• ML: scikit-learn
• CI/CD: Jenkins
• Packaging: Docker
• Orchestrator: Airflow, Kubernetes
• Monitoring: ELK, Prometheus
Methodologies
1. Combination of DevOps (CI/CD), Software Engineering and ML
2. ML experiments are captured as runs
3. Each run captures all its steps, its data, parameters, hyper parameters, code,
initializers, model evaluations, artifacts like trained models and business results
after deployment
4. Packaging a model: container
Closer look at some pipelines
1. Build pipeline
• Triggered on schedule or when new code checked in / data becomes available
• Building code and running unit tests
• Data tests: schema and distribution conformance
2. Retrain pipeline
• Triggered on a schedule or when new data becomes available
• Train, evaluate and register model
3. Release pipeline
• Triggered every time a new artifact is available
• Package, test, deploy to production, start monitoring
• Background
• ML Lifecycle
• Challenges with ML Productization
• Examples of end-to-end ML platforms
• MLOps Best Practices
• MLOps Methodologies
• Build, Retrain and Release Pipelines
• Mlflow and demo
Agenda
• Airflow demo
• Model Serving Pipeline
• Tensorflow Model Serving
• Tensorflow js demo
• TFX-based MLOps system on Google Cloud
• Azure MLOps
• Conclusion
• Q & A
MLflow Tracking
Record and query
experiments: code, data,
config, and results
mlflow
MLflow Projects
Package data science code in
a format to reproduce runs on
any platform
MLflow Models
Deploy machine learning
models in diverse serving
environments
MLflow Registry
Store, annotate, discover,
and manage models in a
central repository
mlflow demo
Airflow demo
Model serving
Embedded model
1. Serialized pickle file
2. Language agnostic exchange formats like PMML, PFA and ONNX
3. H2O exports a POJO in a JAR
Separate service
1. Cloud providers’ tools and SDKs wrapping models
2. Kubeflow
3. mlflow models
Published as data
1. Typically used in streaming / real time scenarios
Tensorflow.js model serving demo
1. Open google chrome
2. Open chrome://apps/
3. Start web server
4. RockPaperScissorsTensorflow.jsDemo (based on a courser assignment)
5. Open http://127.0.0.1:8887 in chrome
6. Open developer tools
7. Demo retraining and predictions
• Background
• ML Lifecycle
• Challenges with ML Productization
• Examples of end-to-end ML platforms
• MLOps Best Practices
• MLOps Methodologies
• Build, Retrain and Release Pipelines
• Mlflow and demo
Agenda
• Airflow demo
• Model Serving Pipeline
• Tensorflow Model Serving
• Tensorflow js demo
• TFX-based MLOps system on Google Cloud
• Azure MLOps
• Conclusion
• Q & A
TFX-based MLOps system on Google Cloud
MLOps using Azure Machine Learning
• Evolving field
• Applying learning from other fields like DevOps, Software Engineering
• Taking holistic view
• Upcoming tools and practices
• Key in making ML productization successful
Conclusion
• https://www.brighttalk.com/webcast/6793/365540/machine-learning-operations-mlops-deploy-at-scale
• https://www.linkedin.com/pulse/mlops-101-modern-operation-machine-learning-payam-mokhtarian/
• https://en.wikipedia.org/wiki/MLOps
• https://cloud.google.com/solutions/machine-learning/architecture-for-mlops-using-tfx-kubeflow-pipelines-and-cloud-build
• https://docs.microsoft.com/en-us/samples/microsoft/mlopspython/mlops-with-azure-ml/
• https://ibm-cloud-architecture.github.io/refarch-data-ai-analytics/methodology/MLops/
• https://www.kdnuggets.com/2018/04/operational-machine-learning-successful-mlops.html
• https://www.pgs-soft.com/blog/more-effective-machine-learning-production-with-mlops/
• https://www.xenonstack.com/blog/mlops/
• https://towardsdatascience.com/mlops-reducing-the-technical-debt-of-machine-learning-dac528ef39de
• https://towardsdatascience.com/ml-ops-challenges-solutions-and-future-trends-d2e59b74dc6b
References
• https://www.bristoldatascientists.org/wp-content/uploads/sites/5/2020/02/Luke-The-Future-of-MLOps.pdf
• https://www.cognilytica.com/2020/03/03/ml-model-management-and-operations-2020-mlops/
• https://martinfowler.com/articles/data-monolith-to-mesh.html
• https://emilygorcenski.com/post/data-versioning/
• https://papers.nips.cc/paper/5656-hidden-technical-debt-in-machine-learning-systems.pdf
• https://martinfowler.com/articles/cd4ml.html
• https://mlflow.org/docs/latest/index.html
• https://airflow.apache.org/docs/stable/
• https://www.tensorflow.org/
• https://www.coursera.org/specializations/tensorflow-in-practice
• https://www.coursera.org/specializations/tensorflow-data-and-deployment
References (contd.)
Thank you!
sameer.mahajan@gslab.com
For more information please reach out to me at
https://in.linkedin.com/in/sameersmahajan

Nasscom ml ops webinar

  • 1.
  • 2.
    2 Presented by Sameer Mahajan PrincipalArchitect Sameer Mahajan has 25 years of experience in the software industry. He has worked for companies like Microsoft and Symantec across areas like machine learning, storage, cloud, big data, networking and analytics in the United States & India. Sameer holds 9 US patents and is an alumnus of IIT Bombay and Georgia Tech. He not only conducts hands-on workshops and seminars but also participates in panel discussions in upcoming technologies like machine learning and big data. Sameer is one of the mentors for the Machine Learning Foundations course at Coursera.
  • 3.
    • Background • MLLifecycle • Challenges with ML Productization • Examples of end-to-end ML platforms • MLOps Best Practices • MLOps Methodologies • Build, Retrain and Release Pipelines • Mlflow and demo Agenda • Airflow demo • Model Serving Pipeline • Tensorflow Model Serving • Tensorflow js demo • TFX-based MLOps system on Google Cloud • Azure MLOps • Conclusion • Q & A
  • 4.
    • ML spendwill reach $57.6 billion by 2021 • More and more ML systems are going into production • Gartner 2019 Survey suggests that i. 59% have AI deployed today ii. The average number of deployed AI projects to increase to 35 by 2022 • Streamline ML lifecycle • Machine Learning Operations • Started traction in 2018 Background
  • 5.
  • 6.
    Process model –Option B • Reality we are trying to model • Source of data Ingest data from sensors, devices, databases • Cleanse and transform data • Signal processing Visual analytics to capture trends indicative of underlying model processes Use the models in the real world applications and processes for predictions, insights etc. Data Engineering Data Capturing Data Preparation Data Visualization Machine Learning Train models that reflect the real- world phenomena InferenceWorld
  • 7.
    Challenges • Dealing withdata, models and code • Deployment and automation • Collaboration : data engineers, data scientists, ML engineers, business analysts, operations • Continuous Integration (CI), Deployment (CD), Training (CT) • Reproducibility of results • Transformations • Hyperparameters • Initializers • Hardware
  • 8.
    More Challenges • Complexpipelines 1. Ensemble 2. Retraining 3. Transfer learning 4. Multiple prediction pipelines in parallel (Canary) • Self-updating ML pipelines • Governance : tracing failed result back to data or code • Scalability
  • 9.
    Examples of end-to-endML platforms 1. Uber’s Michaelangelo 2. Facebook’s FBLearner 3. Google has TFX 4. Airbnb has BigHead 5. Databricks introduced mlflow which is now open source 6. Sagemaker 7. Azure 8. Datarobot 9. Polyaxon and KubeFlow
  • 10.
    • Background • MLLifecycle • Challenges with ML Productization • Examples of end-to-end ML platforms • MLOps Best Practices • MLOps Methodologies • Build, Retrain and Release Pipelines • Mlflow and demo Agenda • Airflow demo • Model Serving Pipeline • Tensorflow Model Serving • Tensorflow js demo • TFX-based MLOps system on Google Cloud • Azure MLOps • Conclusion • Q & A
  • 11.
    Best Practices • DataPipeline: Discoverable and Accessible Data - data lake, data mesh • Versioned control: github, Data Science Version Control (DVC), mlflow Projects • Data Exploration: Jupyter, pandas, numpy, seaborn • ML: scikit-learn • CI/CD: Jenkins • Packaging: Docker • Orchestrator: Airflow, Kubernetes • Monitoring: ELK, Prometheus
  • 12.
    Methodologies 1. Combination ofDevOps (CI/CD), Software Engineering and ML 2. ML experiments are captured as runs 3. Each run captures all its steps, its data, parameters, hyper parameters, code, initializers, model evaluations, artifacts like trained models and business results after deployment 4. Packaging a model: container
  • 13.
    Closer look atsome pipelines 1. Build pipeline • Triggered on schedule or when new code checked in / data becomes available • Building code and running unit tests • Data tests: schema and distribution conformance 2. Retrain pipeline • Triggered on a schedule or when new data becomes available • Train, evaluate and register model 3. Release pipeline • Triggered every time a new artifact is available • Package, test, deploy to production, start monitoring
  • 14.
    • Background • MLLifecycle • Challenges with ML Productization • Examples of end-to-end ML platforms • MLOps Best Practices • MLOps Methodologies • Build, Retrain and Release Pipelines • Mlflow and demo Agenda • Airflow demo • Model Serving Pipeline • Tensorflow Model Serving • Tensorflow js demo • TFX-based MLOps system on Google Cloud • Azure MLOps • Conclusion • Q & A
  • 15.
    MLflow Tracking Record andquery experiments: code, data, config, and results mlflow MLflow Projects Package data science code in a format to reproduce runs on any platform MLflow Models Deploy machine learning models in diverse serving environments MLflow Registry Store, annotate, discover, and manage models in a central repository
  • 16.
  • 17.
  • 18.
    Model serving Embedded model 1.Serialized pickle file 2. Language agnostic exchange formats like PMML, PFA and ONNX 3. H2O exports a POJO in a JAR Separate service 1. Cloud providers’ tools and SDKs wrapping models 2. Kubeflow 3. mlflow models Published as data 1. Typically used in streaming / real time scenarios
  • 20.
    Tensorflow.js model servingdemo 1. Open google chrome 2. Open chrome://apps/ 3. Start web server 4. RockPaperScissorsTensorflow.jsDemo (based on a courser assignment) 5. Open http://127.0.0.1:8887 in chrome 6. Open developer tools 7. Demo retraining and predictions
  • 21.
    • Background • MLLifecycle • Challenges with ML Productization • Examples of end-to-end ML platforms • MLOps Best Practices • MLOps Methodologies • Build, Retrain and Release Pipelines • Mlflow and demo Agenda • Airflow demo • Model Serving Pipeline • Tensorflow Model Serving • Tensorflow js demo • TFX-based MLOps system on Google Cloud • Azure MLOps • Conclusion • Q & A
  • 22.
    TFX-based MLOps systemon Google Cloud
  • 23.
    MLOps using AzureMachine Learning
  • 24.
    • Evolving field •Applying learning from other fields like DevOps, Software Engineering • Taking holistic view • Upcoming tools and practices • Key in making ML productization successful Conclusion
  • 25.
    • https://www.brighttalk.com/webcast/6793/365540/machine-learning-operations-mlops-deploy-at-scale • https://www.linkedin.com/pulse/mlops-101-modern-operation-machine-learning-payam-mokhtarian/ •https://en.wikipedia.org/wiki/MLOps • https://cloud.google.com/solutions/machine-learning/architecture-for-mlops-using-tfx-kubeflow-pipelines-and-cloud-build • https://docs.microsoft.com/en-us/samples/microsoft/mlopspython/mlops-with-azure-ml/ • https://ibm-cloud-architecture.github.io/refarch-data-ai-analytics/methodology/MLops/ • https://www.kdnuggets.com/2018/04/operational-machine-learning-successful-mlops.html • https://www.pgs-soft.com/blog/more-effective-machine-learning-production-with-mlops/ • https://www.xenonstack.com/blog/mlops/ • https://towardsdatascience.com/mlops-reducing-the-technical-debt-of-machine-learning-dac528ef39de • https://towardsdatascience.com/ml-ops-challenges-solutions-and-future-trends-d2e59b74dc6b References
  • 26.
    • https://www.bristoldatascientists.org/wp-content/uploads/sites/5/2020/02/Luke-The-Future-of-MLOps.pdf • https://www.cognilytica.com/2020/03/03/ml-model-management-and-operations-2020-mlops/ •https://martinfowler.com/articles/data-monolith-to-mesh.html • https://emilygorcenski.com/post/data-versioning/ • https://papers.nips.cc/paper/5656-hidden-technical-debt-in-machine-learning-systems.pdf • https://martinfowler.com/articles/cd4ml.html • https://mlflow.org/docs/latest/index.html • https://airflow.apache.org/docs/stable/ • https://www.tensorflow.org/ • https://www.coursera.org/specializations/tensorflow-in-practice • https://www.coursera.org/specializations/tensorflow-data-and-deployment References (contd.)
  • 27.
    Thank you! sameer.mahajan@gslab.com For moreinformation please reach out to me at https://in.linkedin.com/in/sameersmahajan