SlideShare a Scribd company logo
Infrastructure Agnostic
Machine Learning
Workload Deployment
Abi Akogun
Data Science Consultant (MavenCode)
Charles Adetiloye
ML Platforms Engineer (MavenCode)
About MavenCode
MavenCode is an Artificial Intelligence Solutions company located in Dallas, Texas - We do
training, product development, and consulting services in the following areas:
● Provisioning Scalable Data Processing Pipelines on Cloud Infrastructure
● Development & Deployment of Machine Learning and Artificial Intelligence Platforms
● Streaming and Big Data Analytics Edge-IoT and Sensors
About The Presenters
Charles Adetiloye is an ML Platforms Engineer
at MavenCode. He has well over 15 years of
experience building large-scale, distributed
applications. He has extensive experience
working and consulting with several companies
implementing production grade ML and AI
platforms
twitter.com/cadetiloye
Abiodun Akogun is a Machine Learning and Data
Science Consultant at Mavencode. He has extensive
experience building and deploying large-scale Machine
Learning Applications in different industries that
include Healthcare, Finance, Telecommunications, and
Insurance. He has experience solving several business
problems using Data Analytics, Sentiment Analysis,
Topic Modelling, Named Entity Recognition(N.E.R),
Opinion Mining, Data Mining, Time Series, Spatial
Statistics and Marketing Analytics
twitter.com/akogz
Agenda
▪ Overview of Machine Learning Model Deployment
Workflow
▪ Various Approaches to model training,
management, and serving in the Cloud
▪ Deploying Machine Learning Workloads in the
Cloud
▪ Implementing Feature Storage backend for ML
model training
▪ Running Spark Workloads for ML training on
Kubernetes with Kubeflow
Overview of Machine Learning Deployment Workflow
Data
Sourcing
Pre
Processing
Feature
Engineering
Model
Training /
Evaluation
Model Scoring
/Management
Model
Inferencing
Machine Learning Workload Deployment
Data
Sourcing
Pre
Processing
Feature
Engineering
Model
Training /
Evaluation
Model Scoring
/Management
Model
Inferencing
Google Cloud AWS Azure On Prem
Machine Learning Deployment Effort
Data Verification
Configuration
Feature
Extraction
Data Validation
Machine Resource
Management
Serving
Infrastructure
Monitoring
Analysis Tool
Machine Learning Code
Data Preparation +
Storage
Efficient Compute
Resource Management
Overview of Machine Learning Deployment Workflow
Data
Sourcing
Pre
Processing
Feature
Engineering
Model
Training /
Evaluation
Model Scoring
/Management
Model
Inferencing
32%
10%
36%
2% 4%
16%
A Typical Machine Learning Developer Workflow
Data
Sourcing
Pre
Processing
Feature
Engineering
Model
Training /
Evaluation
Model
Scoring
/Management
Model
Inferencing
Azure
Storage
Google Storage
AWS S3
Storage
Raw Data Transformation Processed Data
Storage Compute
1 2
Google Cloud AI AWS Sage Maker Azure ML
Data Scientist / ML Engineers
works on pulling or processing
data first before starting ML
training on a Managed Cloud
Service
Raw Data Processing and
Transformation Pipeline
Cloud Training Platforms
What Enterprise Machine Learning Workflow In the
Cloud Looks Like!
Data
Sourcing
Pre
Processing
Feature
Engineering
Azure
Storage
Google Storage
AWS S3
Storage
Raw Data Transformation Processed Data
Storage Compute
1 2
Team A
Team B
Team C
Team D
Google Cloud AI
AWS SageMaker
AWS SageMaker
Azure ML
Running ML workflow across
the enterprise with multiple
teams using different Cloud
Provider technology stacks
Implementing Machine
Learning solutions in the
cloud comes at a cost, with
cost of Compute and
Storage on top of the list.
If we plan to be Cloud Neutral, can we abstract our
● Machine Learning Compute Workload→Kubernetes?
● Machine Storage → Feature Store?
Google Cloud AI AWS Sage Maker Azure ML
A Typical Machine Learning Developer Workflow
Data
Sourcing
Pre
Processing
Feature
Engineering
Model
Training /
Evaluation
Model Scoring
/Management
Model
Inferencing
Azure
Storage
Google Storage
AWS S3
Storage
Data Source Transformation Processed Data
Storage Compute
1 2
Towards A Cloud Neutral ML Deployment Environment
Data Sourcing Pre Processing
Feature
Engineering
Model Training /
Evaluation
Model Scoring
/Management
Model
Inferencing
Storage Compute
1 2
Feature Store
Kubernetes
Why the need for
Cloud Agnostic
Deployment
Infrastructure?
● Makes it easier to migrate workloads in a Hybrid Cloud Environment
● We are not tied to particular Cloud Infrastructure technology stack
● It’s easier to Implement best practice patterns and solutions
● Your team will have a common base denominator for all Enterprise ML workload
● Easy to control cost, manage utilization and forecast demand
Cloud Agnostic Machine Learning Development
Data Sourcing Pre Processing
Feature
Engineering
Model Training /
Evaluation
Model Scoring
/Management
Model
Inferencing
Storage Compute
1 2
Feature Store
Kubernetes
Azure Storage
Google Storage
AWS S3 Storage
What’s Feature Store All about?
A Feature is a measurable observable attribute that is part of the input to a
Machine Learning Model.
Model Training
X1
X2
X3
Xn
[Feature Vector]
Model
What’s Feature Store All about?
Model Training
X1
X2
X3
Xn
[Feature Vector]
Model
Model 1
Features are derived from
● Raw Datastore
● Streaming Datasource
● Aggregates of Raw Inputs
● Windows (mins, hourly, daily, weekly)
Features Change Over time!
Model Training
X1
X2
X3
Xn
X1
X2
X3
Xn
X1
X2
X3
Xn
Time
Machine Learning Feature Store
● Makes it easy to operationalize our ML workload, most importantly Data
Management and Storage for Model training
● Features can be shared easily amon teams running different Model
training pipelines
● We can get to version of datasets and track changes easily
● Consistency in Feature input attributes between Model Training and
Serving
● Offline Feature Store → Batching Training
● Online Feature Store → Inferencing / Serving
Types Of Feature Store
Implementing Offline Feature Storage with Apache Hudi
Azure
Storage
Google Storage
AWS S3
Storage
Streaming Source
Batch Job Operations
Datasource with
Streaming sources like
MQTT, Kafka, Pubsub
etc
Batch Operations on
Databases, FileStorage,
Distributed Storage etc
Feature Store
Workflow Scheduling
Orchestration with
Kubeflow Pipelines or
Airflow Dags on
Kubernetes
Feature Store
Implementation on any
of the Major Cloud
Storage
● A need for a Unified Platform where new data can be made available in addition to historical
data within minutes.
● The need for a quick computation (or derivation ) of Feature vectors in other to make them
available for our model input.
● Incremental Versioning of our Feature collections so that we can time-travel and use a
particular set of features for Model training.
● Our Hudi dataset can be stored in Azure, Google Cloud, AWS cloud storage layer.
● Easy to implement all our code and everything we need to do with Spark and PySpark
Why did we use Apache Hudi?
Getting Data into Hudi Feature Store with Kubeflow Pipeline
import kfp
from kfp import components
KafkaDatastreamer_op =
kfp.components.create_component_from_func(KafkaDatastreamer,base_image="python:3.7.1”)
ValidatorOnSchema_op =
kfp.components.create_component_from_func(ValidatorOnSchema,base_image="python:3.7.1")
PreProcessor_op =
kfp.components.create_component_from_func(PreProcessor,base_image="python:3.7.1")
HudiTableWriter_op= kfp.components.create_component_from_func(HudiTableWriter,
base_image="mavencode.io/spark:v3.1.1")
The Hudi Data Store writer
Configure the Spark Session
with the packages needed to
run hudi and avro
Hudi configuration Options
Writing the data into our
Hudi data store in the right
format
Cloud Agnostic Machine Learning Development
Data Sourcing Pre Processing
Feature
Engineering
Model Training /
Evaluation
Model Scoring
/Management
Model
Inferencing
Storage Compute
1 2
Feature Store
Kubernetes
Cloud Native ML Workload
Deployment with Operators
on Kubeflow
Cloud Native ML Training Deployment
● Containerized Workload
● Scalable + Can Run in Distributed Mode
● Efficient Compute Utilization
● Language Agnostic!
Machine Learning Operators with Kubeflow on
Kubernetes
● An Machine Learning Operator helps the deployment
monitoring and management a model training life-
cycle
● Some ML Operators found in Kubeflow are:
○ TF-operator → Tensorflow Job
○ Pytorch-operator → Pytorch Job
○ Xgboost-operator → Xgboost Job
○ Spark-operator → Spark and Spark ML Jobs
Cloud Agnostic Machine Learning Development
MLOps Model Training and Deployment Platform
Kubeflow Jupyter NoteBook Kubeflow Jupyter NoteBook Kubeflow Jupyter NoteBook Kubeflow Jupyter NoteBook
Namespace Namespace Namespace Namespace
Auto-Scalable CPU Node Pool Auto-Scalable GPU Node Pool
Spark Operator Spark Operator
TensorFlow Operator Tensorflow Operator
Cloud Infrastructure Layer
Running
Auto Scaling Node Pools
Running Kubernetes
Machine Learning
Operators running with
Kubeflow
Feature Store
Using Spark Operator for Training ML Steps
PySpark
ML Code
Containerize
the Python
Code
Create
SparkApplication
Kubernetes YAML
Deployment
Apply
Deployment to
Kubernetes
Spark Operator on Kubernetes
API
Scheduler
OR OR OR
Spark Driver
Executors
Elastic Compute Resource ML Jobs
API
Scheduler
OR OR OR
kubectl apply -f ...
Deployment Configuration YAML
Spark Application Config
that describes the job and
the namespace where the
job will run
Container that will run our
Spark ML Code
Spark Drive and Executor
Configuration
Connecting to Feature Store with Kubeflow Pipeline
Cost comparison with Managed Cloud service on AWS
30%
100%
15s
66s
Compute Utilization Cost Compute Startup Uptime Team Agility & Productivity
6x Productivity
Managed Services Running on AWS
Kubeflow + S3 Feast Storage ML workload
Summary
● Implementing a Cloud neutral ML deployment approach
simplifies most of the complexities in a Multi-Cloud
environment
● After the initial hump, learning curve and the overall
team efficiency improves significantly
● Teams is not locked in to a particular Cloud
Infrastructure stack
● Easy to control cost and forecast future capacity
demands
THANK YOU!
Thank You!
If you are interested in learning more about how to run your
Machine Learning Workloads on any Cloud Infrastructure or
Onprem reach out to us
Drop us a mail hello@mavencode.com
Visit Us Online
https://www.mavencode.com
Follow Us
https://www.twitter.com/mavencode

More Related Content

What's hot

Using MLOps to Bring ML to Production/The Promise of MLOps
Using MLOps to Bring ML to Production/The Promise of MLOpsUsing MLOps to Bring ML to Production/The Promise of MLOps
Using MLOps to Bring ML to Production/The Promise of MLOps
Weaveworks
 
Democratizing Data Quality Through a Centralized Platform
Democratizing Data Quality Through a Centralized PlatformDemocratizing Data Quality Through a Centralized Platform
Democratizing Data Quality Through a Centralized Platform
Databricks
 

What's hot (20)

Using MLOps to Bring ML to Production/The Promise of MLOps
Using MLOps to Bring ML to Production/The Promise of MLOpsUsing MLOps to Bring ML to Production/The Promise of MLOps
Using MLOps to Bring ML to Production/The Promise of MLOps
 
Databricks Platform.pptx
Databricks Platform.pptxDatabricks Platform.pptx
Databricks Platform.pptx
 
Data Catalog & ETL - Glue & Athena
Data Catalog & ETL - Glue & AthenaData Catalog & ETL - Glue & Athena
Data Catalog & ETL - Glue & Athena
 
MLOps Using MLflow
MLOps Using MLflowMLOps Using MLflow
MLOps Using MLflow
 
Azure Databricks - An Introduction (by Kris Bock)
Azure Databricks - An Introduction (by Kris Bock)Azure Databricks - An Introduction (by Kris Bock)
Azure Databricks - An Introduction (by Kris Bock)
 
Azure Arc Overview from Microsoft
Azure Arc Overview from MicrosoftAzure Arc Overview from Microsoft
Azure Arc Overview from Microsoft
 
Data Warehousing Trends, Best Practices, and Future Outlook
Data Warehousing Trends, Best Practices, and Future OutlookData Warehousing Trends, Best Practices, and Future Outlook
Data Warehousing Trends, Best Practices, and Future Outlook
 
Learn to Use Databricks for Data Science
Learn to Use Databricks for Data ScienceLearn to Use Databricks for Data Science
Learn to Use Databricks for Data Science
 
Data Lakehouse, Data Mesh, and Data Fabric (r2)
Data Lakehouse, Data Mesh, and Data Fabric (r2)Data Lakehouse, Data Mesh, and Data Fabric (r2)
Data Lakehouse, Data Mesh, and Data Fabric (r2)
 
Introduction to Azure Databricks
Introduction to Azure DatabricksIntroduction to Azure Databricks
Introduction to Azure Databricks
 
Data Lakehouse Symposium | Day 4
Data Lakehouse Symposium | Day 4Data Lakehouse Symposium | Day 4
Data Lakehouse Symposium | Day 4
 
MLOps and Data Quality: Deploying Reliable ML Models in Production
MLOps and Data Quality: Deploying Reliable ML Models in ProductionMLOps and Data Quality: Deploying Reliable ML Models in Production
MLOps and Data Quality: Deploying Reliable ML Models in Production
 
Databricks Fundamentals
Databricks FundamentalsDatabricks Fundamentals
Databricks Fundamentals
 
Apache Kafka With Spark Structured Streaming With Emma Liu, Nitin Saksena, Ra...
Apache Kafka With Spark Structured Streaming With Emma Liu, Nitin Saksena, Ra...Apache Kafka With Spark Structured Streaming With Emma Liu, Nitin Saksena, Ra...
Apache Kafka With Spark Structured Streaming With Emma Liu, Nitin Saksena, Ra...
 
End to end Machine Learning using Kubeflow - Build, Train, Deploy and Manage
End to end Machine Learning using Kubeflow - Build, Train, Deploy and ManageEnd to end Machine Learning using Kubeflow - Build, Train, Deploy and Manage
End to end Machine Learning using Kubeflow - Build, Train, Deploy and Manage
 
Democratizing Data Quality Through a Centralized Platform
Democratizing Data Quality Through a Centralized PlatformDemocratizing Data Quality Through a Centralized Platform
Democratizing Data Quality Through a Centralized Platform
 
The Ideal Approach to Application Modernization; Which Way to the Cloud?
The Ideal Approach to Application Modernization; Which Way to the Cloud?The Ideal Approach to Application Modernization; Which Way to the Cloud?
The Ideal Approach to Application Modernization; Which Way to the Cloud?
 
[DSC Europe 22] Overview of the Databricks Platform - Petar Zecevic
[DSC Europe 22] Overview of the Databricks Platform - Petar Zecevic[DSC Europe 22] Overview of the Databricks Platform - Petar Zecevic
[DSC Europe 22] Overview of the Databricks Platform - Petar Zecevic
 
Implement SQL Server on an Azure VM
Implement SQL Server on an Azure VMImplement SQL Server on an Azure VM
Implement SQL Server on an Azure VM
 
Introducing Databricks Delta
Introducing Databricks DeltaIntroducing Databricks Delta
Introducing Databricks Delta
 

Similar to Infrastructure Agnostic Machine Learning Workload Deployment

Similar to Infrastructure Agnostic Machine Learning Workload Deployment (20)

Scaling AI/ML with Containers and Kubernetes
Scaling AI/ML with Containers and Kubernetes Scaling AI/ML with Containers and Kubernetes
Scaling AI/ML with Containers and Kubernetes
 
201908 Overview of Automated ML
201908 Overview of Automated ML201908 Overview of Automated ML
201908 Overview of Automated ML
 
Simplifying the Creation of Machine Learning Workflow Pipelines for IoT Appli...
Simplifying the Creation of Machine Learning Workflow Pipelines for IoT Appli...Simplifying the Creation of Machine Learning Workflow Pipelines for IoT Appli...
Simplifying the Creation of Machine Learning Workflow Pipelines for IoT Appli...
 
[Giovanni Galloro] How to use machine learning on Google Cloud Platform
[Giovanni Galloro] How to use machine learning on Google Cloud Platform[Giovanni Galloro] How to use machine learning on Google Cloud Platform
[Giovanni Galloro] How to use machine learning on Google Cloud Platform
 
GDG Cloud Southlake #16: Priyanka Vergadia: Scalable Data Analytics in Google...
GDG Cloud Southlake #16: Priyanka Vergadia: Scalable Data Analytics in Google...GDG Cloud Southlake #16: Priyanka Vergadia: Scalable Data Analytics in Google...
GDG Cloud Southlake #16: Priyanka Vergadia: Scalable Data Analytics in Google...
 
Big Data Adavnced Analytics on Microsoft Azure
Big Data Adavnced Analytics on Microsoft AzureBig Data Adavnced Analytics on Microsoft Azure
Big Data Adavnced Analytics on Microsoft Azure
 
Global AI Bootcamp Madrid - Azure Databricks
Global AI Bootcamp Madrid - Azure DatabricksGlobal AI Bootcamp Madrid - Azure Databricks
Global AI Bootcamp Madrid - Azure Databricks
 
ML Platform Q1 Meetup: Airbnb's End-to-End Machine Learning Infrastructure
ML Platform Q1 Meetup: Airbnb's End-to-End Machine Learning InfrastructureML Platform Q1 Meetup: Airbnb's End-to-End Machine Learning Infrastructure
ML Platform Q1 Meetup: Airbnb's End-to-End Machine Learning Infrastructure
 
ODSC East 2020 Accelerate ML Lifecycle with Kubernetes and Containerized Da...
ODSC East 2020   Accelerate ML Lifecycle with Kubernetes and Containerized Da...ODSC East 2020   Accelerate ML Lifecycle with Kubernetes and Containerized Da...
ODSC East 2020 Accelerate ML Lifecycle with Kubernetes and Containerized Da...
 
Machine Learning and AI
Machine Learning and AIMachine Learning and AI
Machine Learning and AI
 
Serverless machine learning architectures at Helixa
Serverless machine learning architectures at HelixaServerless machine learning architectures at Helixa
Serverless machine learning architectures at Helixa
 
Vertex AI - Unified ML Platform for the entire AI workflow on Google Cloud
Vertex AI - Unified ML Platform for the entire AI workflow on Google CloudVertex AI - Unified ML Platform for the entire AI workflow on Google Cloud
Vertex AI - Unified ML Platform for the entire AI workflow on Google Cloud
 
BigQuery ML - Machine learning at scale using SQL
BigQuery ML - Machine learning at scale using SQLBigQuery ML - Machine learning at scale using SQL
BigQuery ML - Machine learning at scale using SQL
 
Kubeflow: portable and scalable machine learning using Jupyterhub and Kuberne...
Kubeflow: portable and scalable machine learning using Jupyterhub and Kuberne...Kubeflow: portable and scalable machine learning using Jupyterhub and Kuberne...
Kubeflow: portable and scalable machine learning using Jupyterhub and Kuberne...
 
Ai & Data Analytics 2018 - Azure Databricks for data scientist
Ai & Data Analytics 2018 - Azure Databricks for data scientistAi & Data Analytics 2018 - Azure Databricks for data scientist
Ai & Data Analytics 2018 - Azure Databricks for data scientist
 
ML Infrastracture @ Dropbox
ML Infrastracture @ Dropbox ML Infrastracture @ Dropbox
ML Infrastracture @ Dropbox
 
GDG Cloud Southlake #3 Charles Adetiloye: Enterprise MLOps in Practice
GDG Cloud Southlake #3 Charles Adetiloye: Enterprise MLOps in PracticeGDG Cloud Southlake #3 Charles Adetiloye: Enterprise MLOps in Practice
GDG Cloud Southlake #3 Charles Adetiloye: Enterprise MLOps in Practice
 
2018 11 14 Artificial Intelligence and Machine Learning in Azure
2018 11 14 Artificial Intelligence and Machine Learning in Azure2018 11 14 Artificial Intelligence and Machine Learning in Azure
2018 11 14 Artificial Intelligence and Machine Learning in Azure
 
Dataminds - ML in Production
Dataminds - ML in ProductionDataminds - ML in Production
Dataminds - ML in Production
 
Building a MLOps Platform Around MLflow to Enable Model Productionalization i...
Building a MLOps Platform Around MLflow to Enable Model Productionalization i...Building a MLOps Platform Around MLflow to Enable Model Productionalization i...
Building a MLOps Platform Around MLflow to Enable Model Productionalization i...
 

More from Databricks

Stage Level Scheduling Improving Big Data and AI Integration
Stage Level Scheduling Improving Big Data and AI IntegrationStage Level Scheduling Improving Big Data and AI Integration
Stage Level Scheduling Improving Big Data and AI Integration
Databricks
 
Simplify Data Conversion from Spark to TensorFlow and PyTorch
Simplify Data Conversion from Spark to TensorFlow and PyTorchSimplify Data Conversion from Spark to TensorFlow and PyTorch
Simplify Data Conversion from Spark to TensorFlow and PyTorch
Databricks
 
Raven: End-to-end Optimization of ML Prediction Queries
Raven: End-to-end Optimization of ML Prediction QueriesRaven: End-to-end Optimization of ML Prediction Queries
Raven: End-to-end Optimization of ML Prediction Queries
Databricks
 
Processing Large Datasets for ADAS Applications using Apache Spark
Processing Large Datasets for ADAS Applications using Apache SparkProcessing Large Datasets for ADAS Applications using Apache Spark
Processing Large Datasets for ADAS Applications using Apache Spark
Databricks
 
Jeeves Grows Up: An AI Chatbot for Performance and Quality
Jeeves Grows Up: An AI Chatbot for Performance and QualityJeeves Grows Up: An AI Chatbot for Performance and Quality
Jeeves Grows Up: An AI Chatbot for Performance and Quality
Databricks
 
Intuitive & Scalable Hyperparameter Tuning with Apache Spark + Fugue
Intuitive & Scalable Hyperparameter Tuning with Apache Spark + FugueIntuitive & Scalable Hyperparameter Tuning with Apache Spark + Fugue
Intuitive & Scalable Hyperparameter Tuning with Apache Spark + Fugue
Databricks
 

More from Databricks (20)

DW Migration Webinar-March 2022.pptx
DW Migration Webinar-March 2022.pptxDW Migration Webinar-March 2022.pptx
DW Migration Webinar-March 2022.pptx
 
Data Lakehouse Symposium | Day 1 | Part 1
Data Lakehouse Symposium | Day 1 | Part 1Data Lakehouse Symposium | Day 1 | Part 1
Data Lakehouse Symposium | Day 1 | Part 1
 
Data Lakehouse Symposium | Day 1 | Part 2
Data Lakehouse Symposium | Day 1 | Part 2Data Lakehouse Symposium | Day 1 | Part 2
Data Lakehouse Symposium | Day 1 | Part 2
 
Data Lakehouse Symposium | Day 2
Data Lakehouse Symposium | Day 2Data Lakehouse Symposium | Day 2
Data Lakehouse Symposium | Day 2
 
5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop
5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop
5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop
 
Why APM Is Not the Same As ML Monitoring
Why APM Is Not the Same As ML MonitoringWhy APM Is Not the Same As ML Monitoring
Why APM Is Not the Same As ML Monitoring
 
The Function, the Context, and the Data—Enabling ML Ops at Stitch Fix
The Function, the Context, and the Data—Enabling ML Ops at Stitch FixThe Function, the Context, and the Data—Enabling ML Ops at Stitch Fix
The Function, the Context, and the Data—Enabling ML Ops at Stitch Fix
 
Stage Level Scheduling Improving Big Data and AI Integration
Stage Level Scheduling Improving Big Data and AI IntegrationStage Level Scheduling Improving Big Data and AI Integration
Stage Level Scheduling Improving Big Data and AI Integration
 
Simplify Data Conversion from Spark to TensorFlow and PyTorch
Simplify Data Conversion from Spark to TensorFlow and PyTorchSimplify Data Conversion from Spark to TensorFlow and PyTorch
Simplify Data Conversion from Spark to TensorFlow and PyTorch
 
Scaling your Data Pipelines with Apache Spark on Kubernetes
Scaling your Data Pipelines with Apache Spark on KubernetesScaling your Data Pipelines with Apache Spark on Kubernetes
Scaling your Data Pipelines with Apache Spark on Kubernetes
 
Scaling and Unifying SciKit Learn and Apache Spark Pipelines
Scaling and Unifying SciKit Learn and Apache Spark PipelinesScaling and Unifying SciKit Learn and Apache Spark Pipelines
Scaling and Unifying SciKit Learn and Apache Spark Pipelines
 
Sawtooth Windows for Feature Aggregations
Sawtooth Windows for Feature AggregationsSawtooth Windows for Feature Aggregations
Sawtooth Windows for Feature Aggregations
 
Redis + Apache Spark = Swiss Army Knife Meets Kitchen Sink
Redis + Apache Spark = Swiss Army Knife Meets Kitchen SinkRedis + Apache Spark = Swiss Army Knife Meets Kitchen Sink
Redis + Apache Spark = Swiss Army Knife Meets Kitchen Sink
 
Re-imagine Data Monitoring with whylogs and Spark
Re-imagine Data Monitoring with whylogs and SparkRe-imagine Data Monitoring with whylogs and Spark
Re-imagine Data Monitoring with whylogs and Spark
 
Raven: End-to-end Optimization of ML Prediction Queries
Raven: End-to-end Optimization of ML Prediction QueriesRaven: End-to-end Optimization of ML Prediction Queries
Raven: End-to-end Optimization of ML Prediction Queries
 
Processing Large Datasets for ADAS Applications using Apache Spark
Processing Large Datasets for ADAS Applications using Apache SparkProcessing Large Datasets for ADAS Applications using Apache Spark
Processing Large Datasets for ADAS Applications using Apache Spark
 
Massive Data Processing in Adobe Using Delta Lake
Massive Data Processing in Adobe Using Delta LakeMassive Data Processing in Adobe Using Delta Lake
Massive Data Processing in Adobe Using Delta Lake
 
Machine Learning CI/CD for Email Attack Detection
Machine Learning CI/CD for Email Attack DetectionMachine Learning CI/CD for Email Attack Detection
Machine Learning CI/CD for Email Attack Detection
 
Jeeves Grows Up: An AI Chatbot for Performance and Quality
Jeeves Grows Up: An AI Chatbot for Performance and QualityJeeves Grows Up: An AI Chatbot for Performance and Quality
Jeeves Grows Up: An AI Chatbot for Performance and Quality
 
Intuitive & Scalable Hyperparameter Tuning with Apache Spark + Fugue
Intuitive & Scalable Hyperparameter Tuning with Apache Spark + FugueIntuitive & Scalable Hyperparameter Tuning with Apache Spark + Fugue
Intuitive & Scalable Hyperparameter Tuning with Apache Spark + Fugue
 

Recently uploaded

一比一原版(TWU毕业证)西三一大学毕业证成绩单
一比一原版(TWU毕业证)西三一大学毕业证成绩单一比一原版(TWU毕业证)西三一大学毕业证成绩单
一比一原版(TWU毕业证)西三一大学毕业证成绩单
ocavb
 
一比一原版(ArtEZ毕业证)ArtEZ艺术学院毕业证成绩单
一比一原版(ArtEZ毕业证)ArtEZ艺术学院毕业证成绩单一比一原版(ArtEZ毕业证)ArtEZ艺术学院毕业证成绩单
一比一原版(ArtEZ毕业证)ArtEZ艺术学院毕业证成绩单
vcaxypu
 
一比一原版(BU毕业证)波士顿大学毕业证成绩单
一比一原版(BU毕业证)波士顿大学毕业证成绩单一比一原版(BU毕业证)波士顿大学毕业证成绩单
一比一原版(BU毕业证)波士顿大学毕业证成绩单
ewymefz
 
Computer Presentation.pptx ecommerce advantage s
Computer Presentation.pptx ecommerce advantage sComputer Presentation.pptx ecommerce advantage s
Computer Presentation.pptx ecommerce advantage s
MAQIB18
 
Professional Data Engineer Certification Exam Guide  _  Learn  _  Google Clou...
Professional Data Engineer Certification Exam Guide  _  Learn  _  Google Clou...Professional Data Engineer Certification Exam Guide  _  Learn  _  Google Clou...
Professional Data Engineer Certification Exam Guide  _  Learn  _  Google Clou...
Domenico Conte
 
一比一原版(NYU毕业证)纽约大学毕业证成绩单
一比一原版(NYU毕业证)纽约大学毕业证成绩单一比一原版(NYU毕业证)纽约大学毕业证成绩单
一比一原版(NYU毕业证)纽约大学毕业证成绩单
ewymefz
 
一比一原版(CU毕业证)卡尔顿大学毕业证成绩单
一比一原版(CU毕业证)卡尔顿大学毕业证成绩单一比一原版(CU毕业证)卡尔顿大学毕业证成绩单
一比一原版(CU毕业证)卡尔顿大学毕业证成绩单
yhkoc
 
一比一原版(QU毕业证)皇后大学毕业证成绩单
一比一原版(QU毕业证)皇后大学毕业证成绩单一比一原版(QU毕业证)皇后大学毕业证成绩单
一比一原版(QU毕业证)皇后大学毕业证成绩单
enxupq
 

Recently uploaded (20)

How can I successfully sell my pi coins in Philippines?
How can I successfully sell my pi coins in Philippines?How can I successfully sell my pi coins in Philippines?
How can I successfully sell my pi coins in Philippines?
 
Webinar One View, Multiple Systems No-Code Integration of Salesforce and ERPs
Webinar One View, Multiple Systems No-Code Integration of Salesforce and ERPsWebinar One View, Multiple Systems No-Code Integration of Salesforce and ERPs
Webinar One View, Multiple Systems No-Code Integration of Salesforce and ERPs
 
一比一原版(TWU毕业证)西三一大学毕业证成绩单
一比一原版(TWU毕业证)西三一大学毕业证成绩单一比一原版(TWU毕业证)西三一大学毕业证成绩单
一比一原版(TWU毕业证)西三一大学毕业证成绩单
 
一比一原版(ArtEZ毕业证)ArtEZ艺术学院毕业证成绩单
一比一原版(ArtEZ毕业证)ArtEZ艺术学院毕业证成绩单一比一原版(ArtEZ毕业证)ArtEZ艺术学院毕业证成绩单
一比一原版(ArtEZ毕业证)ArtEZ艺术学院毕业证成绩单
 
一比一原版(BU毕业证)波士顿大学毕业证成绩单
一比一原版(BU毕业证)波士顿大学毕业证成绩单一比一原版(BU毕业证)波士顿大学毕业证成绩单
一比一原版(BU毕业证)波士顿大学毕业证成绩单
 
Uber Ride Supply Demand Gap Analysis Report
Uber Ride Supply Demand Gap Analysis ReportUber Ride Supply Demand Gap Analysis Report
Uber Ride Supply Demand Gap Analysis Report
 
Computer Presentation.pptx ecommerce advantage s
Computer Presentation.pptx ecommerce advantage sComputer Presentation.pptx ecommerce advantage s
Computer Presentation.pptx ecommerce advantage s
 
Jpolillo Amazon PPC - Bid Optimization Sample
Jpolillo Amazon PPC - Bid Optimization SampleJpolillo Amazon PPC - Bid Optimization Sample
Jpolillo Amazon PPC - Bid Optimization Sample
 
Business update Q1 2024 Lar España Real Estate SOCIMI
Business update Q1 2024 Lar España Real Estate SOCIMIBusiness update Q1 2024 Lar España Real Estate SOCIMI
Business update Q1 2024 Lar España Real Estate SOCIMI
 
Professional Data Engineer Certification Exam Guide  _  Learn  _  Google Clou...
Professional Data Engineer Certification Exam Guide  _  Learn  _  Google Clou...Professional Data Engineer Certification Exam Guide  _  Learn  _  Google Clou...
Professional Data Engineer Certification Exam Guide  _  Learn  _  Google Clou...
 
一比一原版(NYU毕业证)纽约大学毕业证成绩单
一比一原版(NYU毕业证)纽约大学毕业证成绩单一比一原版(NYU毕业证)纽约大学毕业证成绩单
一比一原版(NYU毕业证)纽约大学毕业证成绩单
 
Tabula.io Cheatsheet: automate your data workflows
Tabula.io Cheatsheet: automate your data workflowsTabula.io Cheatsheet: automate your data workflows
Tabula.io Cheatsheet: automate your data workflows
 
Supply chain analytics to combat the effects of Ukraine-Russia-conflict
Supply chain analytics to combat the effects of Ukraine-Russia-conflictSupply chain analytics to combat the effects of Ukraine-Russia-conflict
Supply chain analytics to combat the effects of Ukraine-Russia-conflict
 
一比一原版(CU毕业证)卡尔顿大学毕业证成绩单
一比一原版(CU毕业证)卡尔顿大学毕业证成绩单一比一原版(CU毕业证)卡尔顿大学毕业证成绩单
一比一原版(CU毕业证)卡尔顿大学毕业证成绩单
 
一比一原版(QU毕业证)皇后大学毕业证成绩单
一比一原版(QU毕业证)皇后大学毕业证成绩单一比一原版(QU毕业证)皇后大学毕业证成绩单
一比一原版(QU毕业证)皇后大学毕业证成绩单
 
tapal brand analysis PPT slide for comptetive data
tapal brand analysis PPT slide for comptetive datatapal brand analysis PPT slide for comptetive data
tapal brand analysis PPT slide for comptetive data
 
社内勉強会資料_LLM Agents                              .
社内勉強会資料_LLM Agents                              .社内勉強会資料_LLM Agents                              .
社内勉強会資料_LLM Agents                              .
 
Q1’2024 Update: MYCI’s Leap Year Rebound
Q1’2024 Update: MYCI’s Leap Year ReboundQ1’2024 Update: MYCI’s Leap Year Rebound
Q1’2024 Update: MYCI’s Leap Year Rebound
 
Slip-and-fall Injuries: Top Workers' Comp Claims
Slip-and-fall Injuries: Top Workers' Comp ClaimsSlip-and-fall Injuries: Top Workers' Comp Claims
Slip-and-fall Injuries: Top Workers' Comp Claims
 
Using PDB Relocation to Move a Single PDB to Another Existing CDB
Using PDB Relocation to Move a Single PDB to Another Existing CDBUsing PDB Relocation to Move a Single PDB to Another Existing CDB
Using PDB Relocation to Move a Single PDB to Another Existing CDB
 

Infrastructure Agnostic Machine Learning Workload Deployment

  • 1. Infrastructure Agnostic Machine Learning Workload Deployment Abi Akogun Data Science Consultant (MavenCode) Charles Adetiloye ML Platforms Engineer (MavenCode)
  • 2. About MavenCode MavenCode is an Artificial Intelligence Solutions company located in Dallas, Texas - We do training, product development, and consulting services in the following areas: ● Provisioning Scalable Data Processing Pipelines on Cloud Infrastructure ● Development & Deployment of Machine Learning and Artificial Intelligence Platforms ● Streaming and Big Data Analytics Edge-IoT and Sensors
  • 3. About The Presenters Charles Adetiloye is an ML Platforms Engineer at MavenCode. He has well over 15 years of experience building large-scale, distributed applications. He has extensive experience working and consulting with several companies implementing production grade ML and AI platforms twitter.com/cadetiloye Abiodun Akogun is a Machine Learning and Data Science Consultant at Mavencode. He has extensive experience building and deploying large-scale Machine Learning Applications in different industries that include Healthcare, Finance, Telecommunications, and Insurance. He has experience solving several business problems using Data Analytics, Sentiment Analysis, Topic Modelling, Named Entity Recognition(N.E.R), Opinion Mining, Data Mining, Time Series, Spatial Statistics and Marketing Analytics twitter.com/akogz
  • 4. Agenda ▪ Overview of Machine Learning Model Deployment Workflow ▪ Various Approaches to model training, management, and serving in the Cloud ▪ Deploying Machine Learning Workloads in the Cloud ▪ Implementing Feature Storage backend for ML model training ▪ Running Spark Workloads for ML training on Kubernetes with Kubeflow
  • 5. Overview of Machine Learning Deployment Workflow Data Sourcing Pre Processing Feature Engineering Model Training / Evaluation Model Scoring /Management Model Inferencing
  • 6. Machine Learning Workload Deployment Data Sourcing Pre Processing Feature Engineering Model Training / Evaluation Model Scoring /Management Model Inferencing Google Cloud AWS Azure On Prem
  • 7. Machine Learning Deployment Effort Data Verification Configuration Feature Extraction Data Validation Machine Resource Management Serving Infrastructure Monitoring Analysis Tool Machine Learning Code Data Preparation + Storage Efficient Compute Resource Management
  • 8. Overview of Machine Learning Deployment Workflow Data Sourcing Pre Processing Feature Engineering Model Training / Evaluation Model Scoring /Management Model Inferencing 32% 10% 36% 2% 4% 16%
  • 9. A Typical Machine Learning Developer Workflow Data Sourcing Pre Processing Feature Engineering Model Training / Evaluation Model Scoring /Management Model Inferencing Azure Storage Google Storage AWS S3 Storage Raw Data Transformation Processed Data Storage Compute 1 2 Google Cloud AI AWS Sage Maker Azure ML Data Scientist / ML Engineers works on pulling or processing data first before starting ML training on a Managed Cloud Service Raw Data Processing and Transformation Pipeline Cloud Training Platforms
  • 10. What Enterprise Machine Learning Workflow In the Cloud Looks Like! Data Sourcing Pre Processing Feature Engineering Azure Storage Google Storage AWS S3 Storage Raw Data Transformation Processed Data Storage Compute 1 2 Team A Team B Team C Team D Google Cloud AI AWS SageMaker AWS SageMaker Azure ML Running ML workflow across the enterprise with multiple teams using different Cloud Provider technology stacks
  • 11. Implementing Machine Learning solutions in the cloud comes at a cost, with cost of Compute and Storage on top of the list.
  • 12. If we plan to be Cloud Neutral, can we abstract our ● Machine Learning Compute Workload→Kubernetes? ● Machine Storage → Feature Store?
  • 13. Google Cloud AI AWS Sage Maker Azure ML A Typical Machine Learning Developer Workflow Data Sourcing Pre Processing Feature Engineering Model Training / Evaluation Model Scoring /Management Model Inferencing Azure Storage Google Storage AWS S3 Storage Data Source Transformation Processed Data Storage Compute 1 2
  • 14. Towards A Cloud Neutral ML Deployment Environment Data Sourcing Pre Processing Feature Engineering Model Training / Evaluation Model Scoring /Management Model Inferencing Storage Compute 1 2 Feature Store Kubernetes
  • 15. Why the need for Cloud Agnostic Deployment Infrastructure?
  • 16. ● Makes it easier to migrate workloads in a Hybrid Cloud Environment ● We are not tied to particular Cloud Infrastructure technology stack ● It’s easier to Implement best practice patterns and solutions ● Your team will have a common base denominator for all Enterprise ML workload ● Easy to control cost, manage utilization and forecast demand
  • 17. Cloud Agnostic Machine Learning Development Data Sourcing Pre Processing Feature Engineering Model Training / Evaluation Model Scoring /Management Model Inferencing Storage Compute 1 2 Feature Store Kubernetes Azure Storage Google Storage AWS S3 Storage
  • 18. What’s Feature Store All about? A Feature is a measurable observable attribute that is part of the input to a Machine Learning Model. Model Training X1 X2 X3 Xn [Feature Vector] Model
  • 19. What’s Feature Store All about? Model Training X1 X2 X3 Xn [Feature Vector] Model Model 1 Features are derived from ● Raw Datastore ● Streaming Datasource ● Aggregates of Raw Inputs ● Windows (mins, hourly, daily, weekly)
  • 20. Features Change Over time! Model Training X1 X2 X3 Xn X1 X2 X3 Xn X1 X2 X3 Xn Time
  • 21. Machine Learning Feature Store ● Makes it easy to operationalize our ML workload, most importantly Data Management and Storage for Model training ● Features can be shared easily amon teams running different Model training pipelines ● We can get to version of datasets and track changes easily ● Consistency in Feature input attributes between Model Training and Serving
  • 22. ● Offline Feature Store → Batching Training ● Online Feature Store → Inferencing / Serving Types Of Feature Store
  • 23. Implementing Offline Feature Storage with Apache Hudi Azure Storage Google Storage AWS S3 Storage Streaming Source Batch Job Operations Datasource with Streaming sources like MQTT, Kafka, Pubsub etc Batch Operations on Databases, FileStorage, Distributed Storage etc Feature Store Workflow Scheduling Orchestration with Kubeflow Pipelines or Airflow Dags on Kubernetes Feature Store Implementation on any of the Major Cloud Storage
  • 24. ● A need for a Unified Platform where new data can be made available in addition to historical data within minutes. ● The need for a quick computation (or derivation ) of Feature vectors in other to make them available for our model input. ● Incremental Versioning of our Feature collections so that we can time-travel and use a particular set of features for Model training. ● Our Hudi dataset can be stored in Azure, Google Cloud, AWS cloud storage layer. ● Easy to implement all our code and everything we need to do with Spark and PySpark Why did we use Apache Hudi?
  • 25. Getting Data into Hudi Feature Store with Kubeflow Pipeline import kfp from kfp import components KafkaDatastreamer_op = kfp.components.create_component_from_func(KafkaDatastreamer,base_image="python:3.7.1”) ValidatorOnSchema_op = kfp.components.create_component_from_func(ValidatorOnSchema,base_image="python:3.7.1") PreProcessor_op = kfp.components.create_component_from_func(PreProcessor,base_image="python:3.7.1") HudiTableWriter_op= kfp.components.create_component_from_func(HudiTableWriter, base_image="mavencode.io/spark:v3.1.1")
  • 26. The Hudi Data Store writer Configure the Spark Session with the packages needed to run hudi and avro Hudi configuration Options Writing the data into our Hudi data store in the right format
  • 27. Cloud Agnostic Machine Learning Development Data Sourcing Pre Processing Feature Engineering Model Training / Evaluation Model Scoring /Management Model Inferencing Storage Compute 1 2 Feature Store Kubernetes Cloud Native ML Workload Deployment with Operators on Kubeflow Cloud Native ML Training Deployment ● Containerized Workload ● Scalable + Can Run in Distributed Mode ● Efficient Compute Utilization ● Language Agnostic!
  • 28. Machine Learning Operators with Kubeflow on Kubernetes ● An Machine Learning Operator helps the deployment monitoring and management a model training life- cycle ● Some ML Operators found in Kubeflow are: ○ TF-operator → Tensorflow Job ○ Pytorch-operator → Pytorch Job ○ Xgboost-operator → Xgboost Job ○ Spark-operator → Spark and Spark ML Jobs
  • 29. Cloud Agnostic Machine Learning Development MLOps Model Training and Deployment Platform Kubeflow Jupyter NoteBook Kubeflow Jupyter NoteBook Kubeflow Jupyter NoteBook Kubeflow Jupyter NoteBook Namespace Namespace Namespace Namespace Auto-Scalable CPU Node Pool Auto-Scalable GPU Node Pool Spark Operator Spark Operator TensorFlow Operator Tensorflow Operator Cloud Infrastructure Layer Running Auto Scaling Node Pools Running Kubernetes Machine Learning Operators running with Kubeflow Feature Store
  • 30. Using Spark Operator for Training ML Steps PySpark ML Code Containerize the Python Code Create SparkApplication Kubernetes YAML Deployment Apply Deployment to Kubernetes
  • 31. Spark Operator on Kubernetes API Scheduler OR OR OR Spark Driver Executors
  • 32. Elastic Compute Resource ML Jobs API Scheduler OR OR OR kubectl apply -f ...
  • 33. Deployment Configuration YAML Spark Application Config that describes the job and the namespace where the job will run Container that will run our Spark ML Code Spark Drive and Executor Configuration
  • 34. Connecting to Feature Store with Kubeflow Pipeline
  • 35. Cost comparison with Managed Cloud service on AWS 30% 100% 15s 66s Compute Utilization Cost Compute Startup Uptime Team Agility & Productivity 6x Productivity Managed Services Running on AWS Kubeflow + S3 Feast Storage ML workload
  • 36. Summary ● Implementing a Cloud neutral ML deployment approach simplifies most of the complexities in a Multi-Cloud environment ● After the initial hump, learning curve and the overall team efficiency improves significantly ● Teams is not locked in to a particular Cloud Infrastructure stack ● Easy to control cost and forecast future capacity demands
  • 38. Thank You! If you are interested in learning more about how to run your Machine Learning Workloads on any Cloud Infrastructure or Onprem reach out to us Drop us a mail hello@mavencode.com Visit Us Online https://www.mavencode.com Follow Us https://www.twitter.com/mavencode