16 JUNE 2021
Machine Learning
Serving your ML Model 3 ways!
➤ 10 x AWS Certifications including SA Pro, Dev Ops and Machine
Learning Specialism.
➤ Visionary in ML Ops, Produced production workloads of ML models at
scale, including 1500 inferences per minute, including active monitoring
and alerting
➤ Contributes to the AWS Community by speaking at several summits,
community days and meet-ups.
➤ Regular blogger, open-source contributor, and SME on Machine
Learning, MLOps, DevOps, Containers and Serverless.
➤ Experienced principal solutions architect, a lead developer with over 6
years of AWS experience. He has been responsible for running
production workloads of over 200 and 18,000 requests per second
WHO I AM?
Phil Basford
phil@inawisdom.com
@philipbasford
Phil B#4237
Inference types
ML OPS – INFERENCE TYPES
Real Time
➤ Business Critical, commonly uses are chat
bots, classifiers, recommenders or liner
regressors. Like credit risk, journey times
etc
➤ Hundred or thousands individual
predictions per second
➤ API Driven with Low Latency, typically
below 135ms at the 90th percentile.
Near Real Time
➤ Commonly used for image classification or
file analysis
➤ Hundred individual predictions per minute
and processing needs to be done within
seconds
➤ Event or Message Queue based,
predictions are sent back or stored
Occasional
➤ Examples are simple classifiers like Tax
codes
➤ Only a few predictions a month and
processing needs to be completed with
minutes
➤ API, Event or Message Queue based,
predicts sent back or stored
Batch
➤ End of month reporting, invoice
generation, warranty plan management
➤ Runs at Daily / Monthly / Set Times
➤ The data set is typically millions or tens of
millions of rows at once
Micro Batch
➤ Anomaly detection, invoice
approval and Image processing
➤ Executed regularly : every x
minutes or Y number of events.
Triggered by file upload or data
ingestion
➤ The data set is typically hundreds
or thousands of rows at once
Edge
➤ Used for Computer Vision, Fault Detection
in Manufacturing
➤ Runs on mobile phone apps and low
power devices. Uses sensors (i.e. video,
location, or heat)
➤ Model output is normally sent back to the
Cloud at regular intervals for analysis.
Fargate
OPTION 1
➤ Supports Batch and Realtime
➤ Low Latency (<100ms)
➤ Supports only CPU and Not GPU (can
step it down to full ECS)
➤ Pay Per Hour
➤ Application Auto Scaling
➤ Runs Docker and full native support
➤ Not integrated with Notebooks
SageMaker SDK.
➤ No Model Monitor support (records
predictions)
➤ Requires you to build your own images or
a deep learning container
➤ Memory and GPU Limits (can step it down
to full ECS)
SageMaker : Endpoints and Batch Transforms
➤ Supports Batch and Realtime
➤ Built in Algos, Framework and BOYM
support
➤ Low Latency (<100ms)
➤ Supports CPU and GPU
➤ Pay Per Hour (saving plans)
➤ Only recently add to saving plans
➤ Application Auto Scaling
➤ Runs Docker and full native support
➤ One click Deployment: Integration with
SageMaker Studio and Notebook support
via SDK.
➤ Model Monitor support (records
predictions)
➤ No resource limits
OPTION 2
Lambda
OPTION 3
➤ Simple
➤ Supports only Realtime, or micro batch
(15mins)
➤ Low Latency (<100ms)
➤ Supports only CPU and Not GPU
➤ Pay Per Request
➤ Scales on concurrency
➤ Saving plans
➤ *Custom Image : Runs Docker and full
native support
➤ Not integrated with Notebooks
SageMaker SDK.
➤ No Model Monitor support (records
predictions)
➤ Memory and GPU Limits
Thank you.

Ml 3 ways

  • 1.
    16 JUNE 2021 MachineLearning Serving your ML Model 3 ways!
  • 2.
    ➤ 10 xAWS Certifications including SA Pro, Dev Ops and Machine Learning Specialism. ➤ Visionary in ML Ops, Produced production workloads of ML models at scale, including 1500 inferences per minute, including active monitoring and alerting ➤ Contributes to the AWS Community by speaking at several summits, community days and meet-ups. ➤ Regular blogger, open-source contributor, and SME on Machine Learning, MLOps, DevOps, Containers and Serverless. ➤ Experienced principal solutions architect, a lead developer with over 6 years of AWS experience. He has been responsible for running production workloads of over 200 and 18,000 requests per second WHO I AM? Phil Basford phil@inawisdom.com @philipbasford Phil B#4237
  • 3.
    Inference types ML OPS– INFERENCE TYPES Real Time ➤ Business Critical, commonly uses are chat bots, classifiers, recommenders or liner regressors. Like credit risk, journey times etc ➤ Hundred or thousands individual predictions per second ➤ API Driven with Low Latency, typically below 135ms at the 90th percentile. Near Real Time ➤ Commonly used for image classification or file analysis ➤ Hundred individual predictions per minute and processing needs to be done within seconds ➤ Event or Message Queue based, predictions are sent back or stored Occasional ➤ Examples are simple classifiers like Tax codes ➤ Only a few predictions a month and processing needs to be completed with minutes ➤ API, Event or Message Queue based, predicts sent back or stored Batch ➤ End of month reporting, invoice generation, warranty plan management ➤ Runs at Daily / Monthly / Set Times ➤ The data set is typically millions or tens of millions of rows at once Micro Batch ➤ Anomaly detection, invoice approval and Image processing ➤ Executed regularly : every x minutes or Y number of events. Triggered by file upload or data ingestion ➤ The data set is typically hundreds or thousands of rows at once Edge ➤ Used for Computer Vision, Fault Detection in Manufacturing ➤ Runs on mobile phone apps and low power devices. Uses sensors (i.e. video, location, or heat) ➤ Model output is normally sent back to the Cloud at regular intervals for analysis.
  • 4.
    Fargate OPTION 1 ➤ SupportsBatch and Realtime ➤ Low Latency (<100ms) ➤ Supports only CPU and Not GPU (can step it down to full ECS) ➤ Pay Per Hour ➤ Application Auto Scaling ➤ Runs Docker and full native support ➤ Not integrated with Notebooks SageMaker SDK. ➤ No Model Monitor support (records predictions) ➤ Requires you to build your own images or a deep learning container ➤ Memory and GPU Limits (can step it down to full ECS)
  • 5.
    SageMaker : Endpointsand Batch Transforms ➤ Supports Batch and Realtime ➤ Built in Algos, Framework and BOYM support ➤ Low Latency (<100ms) ➤ Supports CPU and GPU ➤ Pay Per Hour (saving plans) ➤ Only recently add to saving plans ➤ Application Auto Scaling ➤ Runs Docker and full native support ➤ One click Deployment: Integration with SageMaker Studio and Notebook support via SDK. ➤ Model Monitor support (records predictions) ➤ No resource limits OPTION 2
  • 6.
    Lambda OPTION 3 ➤ Simple ➤Supports only Realtime, or micro batch (15mins) ➤ Low Latency (<100ms) ➤ Supports only CPU and Not GPU ➤ Pay Per Request ➤ Scales on concurrency ➤ Saving plans ➤ *Custom Image : Runs Docker and full native support ➤ Not integrated with Notebooks SageMaker SDK. ➤ No Model Monitor support (records predictions) ➤ Memory and GPU Limits
  • 7.