SlideShare a Scribd company logo
Deploying Models
Eng Teong Cheah
Microsoft MVP
Inferencing?
In machine learning, inferencing refers to the use of a trained model to predict labels for
new data on which the model has not been trained. Often, the model is deployed as part
of a service that enables applications to request immediate, or real-time, predictions for
individual, or small numbers of data observations.
In Azure Machine Learning, you can create real-time inferencing solutions by deploying a
model as a service, hosted in a containerized platform, such as Azure Kubernetes Services
(AKS).
Deploying a Real-Time Inferencing Service
Machine learning inference during deployment
When deploying your AI model during production, you need to consider how it will make
predictions. The two main processes for AI models are:
•Batch inference: An asynchronous process that bases its predictions on a batch of
observations. The predictions are stored as files or in a database for end users or business
applications.
•Real-time (or interactive) inference: Frees the model to make predictions at any time
and trigger an immediate response. This pattern can be used to analyze streaming and
interactive application data.
Machine learning inference during deployment
Consider the following questions to evaluate your model, compare the two processes,
and select the one that suits your model:
•How often should predictions be generated?
•How soon are the results needed?
•Should predictions be generated individually, in small batches, or in large batches?
•Is latency to be expected from the model?
•How much compute power is needed to execute the model?
•Are there operational implications and costs to maintain the model?
Batch inference
Batch inference, sometimes called offline inference, is a simpler inference process that
helps models to run in timed intervals and business applications to store predictions.
Consider the following best practices for batch inference:
•Trigger batch scoring: Use Azure Machine Learning pipelines and
the ParallelRunStep feature in Azure Machine Learning to set up a schedule or event-
based automation.
•Compute options for batch inference: Since batch inference processes don't run
continuously, it's recommended to automatically start, stop, and scale reusable clusters
that can handle a range of workloads. Different models require different environments,
and your solution needs to be able to deploy a specific environment and remove it when
inference is over for the compute to be available for the next model.
Real-time inference
Real-time, or interactive, inference is architecture where model inference can be triggered
at any time, and an immediate response is expected. This pattern can be used to analyze
streaming data, interactive application data, and more. This mode allows you to take
advantage of your machine learning model in real time and resolves the cold-start
problem outlined above in batch inference.
The following considerations and best practices are available if real-time inference is right
for your model:
•The challenges of real-time inference: Latency and performance requirements make
real-time inference architecture more complex for your model. A system might need to
respond in 100 milliseconds or less, during which it needs to retrieve the data, perform
inference, validate and store the model results, run any required business logic, and
return the results to the system or application.
Real-time inference
•Compute options for real-time inference: The best way to implement real-time
inference is to deploy the model in a container form to Docker or Azure Kubernetes
Service (AKS) cluster and expose it as a web service with a REST API. This way, the model
runs in its own isolated environment and can be managed like any other web service.
Docker and AKS capabilities can then be used for management, monitoring, scaling, and
more. The model can be deployed on-premises, in the cloud, or on the edge. The
preceding compute decision outlines real-time inference.
Real-time inference
•Multiregional deployment and high availability: Regional deployment and high
availability architectures need to be considered in real-time inference scenarios, as latency
and the model's performance will be critical to resolve. To reduce latency in multiregional
deployments, it's recommended to locate the model as close as possible to the
consumption point. The model and supporting infrastructure should follow the business'
high availability and DR principles and strategy.
Create a real-time
inference service
https://ceteongvanness.wordpress.com/2022/11/01/
create-a-real-time-inference-service/
References
Microsoft Docs

More Related Content

Similar to Deploying Models

Welcome to International Journal of Engineering Research and Development (IJERD)
Welcome to International Journal of Engineering Research and Development (IJERD)Welcome to International Journal of Engineering Research and Development (IJERD)
Welcome to International Journal of Engineering Research and Development (IJERD)
IJERD Editor
 
Tour de France Azure PaaS 2/7 Exécuter une application
Tour de France Azure PaaS 2/7 Exécuter une applicationTour de France Azure PaaS 2/7 Exécuter une application
Tour de France Azure PaaS 2/7 Exécuter une application
Alex Danvy
 
cloud services and providers
cloud services and providerscloud services and providers
cloud services and providers
Kalai Selvi
 
MLOps and Reproducible ML on AWS with Kubeflow and SageMaker
MLOps and Reproducible ML on AWS with Kubeflow and SageMakerMLOps and Reproducible ML on AWS with Kubeflow and SageMaker
MLOps and Reproducible ML on AWS with Kubeflow and SageMaker
Provectus
 
Estimating the Total Costs of Your Cloud Analytics Platform
Estimating the Total Costs of Your Cloud Analytics PlatformEstimating the Total Costs of Your Cloud Analytics Platform
Estimating the Total Costs of Your Cloud Analytics Platform
DATAVERSITY
 
A Survey on Heuristic Based Techniques in Cloud Computing
A Survey on Heuristic Based Techniques in Cloud ComputingA Survey on Heuristic Based Techniques in Cloud Computing
A Survey on Heuristic Based Techniques in Cloud Computing
IRJET Journal
 
MLops on Vertex AI Presentation (AI/ML).pptx
MLops on Vertex AI Presentation (AI/ML).pptxMLops on Vertex AI Presentation (AI/ML).pptx
MLops on Vertex AI Presentation (AI/ML).pptx
Knoldus Inc.
 
Modern Architecture in the Cloud of 2018
Modern Architecture in the Cloud of 2018Modern Architecture in the Cloud of 2018
Modern Architecture in the Cloud of 2018
Marius Zaharia
 
201908 Overview of Automated ML
201908 Overview of Automated ML201908 Overview of Automated ML
201908 Overview of Automated ML
Mark Tabladillo
 
Unleashing Apache Kafka and TensorFlow in the Cloud

Unleashing Apache Kafka and TensorFlow in the Cloud
Unleashing Apache Kafka and TensorFlow in the Cloud

Unleashing Apache Kafka and TensorFlow in the Cloud

Kai Wähner
 
Hyf azure ml_1
Hyf azure ml_1Hyf azure ml_1
Hyf azure ml_1
KatoK1
 
AWS ML Model Deployment
AWS ML Model DeploymentAWS ML Model Deployment
AWS ML Model Deployment
Knoldus Inc.
 
Big Data Adavnced Analytics on Microsoft Azure
Big Data Adavnced Analytics on Microsoft AzureBig Data Adavnced Analytics on Microsoft Azure
Big Data Adavnced Analytics on Microsoft Azure
Mark Tabladillo
 
Cloud-Native DevOps: Simplifying application lifecycle management with AWS | ...
Cloud-Native DevOps: Simplifying application lifecycle management with AWS | ...Cloud-Native DevOps: Simplifying application lifecycle management with AWS | ...
Cloud-Native DevOps: Simplifying application lifecycle management with AWS | ...
Amazon Web Services
 
Serverless machine learning architectures at Helixa
Serverless machine learning architectures at HelixaServerless machine learning architectures at Helixa
Serverless machine learning architectures at Helixa
Data Science Milan
 
Microservice message routing on Kubernetes
Microservice message routing on KubernetesMicroservice message routing on Kubernetes
Microservice message routing on Kubernetes
Frans van Buul
 
DevOps for Machine Learning overview en-us
DevOps for Machine Learning overview en-usDevOps for Machine Learning overview en-us
DevOps for Machine Learning overview en-us
eltonrodriguez11
 
Platforming the Major Analytic Use Cases for Modern Engineering
Platforming the Major Analytic Use Cases for Modern EngineeringPlatforming the Major Analytic Use Cases for Modern Engineering
Platforming the Major Analytic Use Cases for Modern Engineering
DATAVERSITY
 
Dynamic resource allocation using virtual machines for cloud computing enviro...
Dynamic resource allocation using virtual machines for cloud computing enviro...Dynamic resource allocation using virtual machines for cloud computing enviro...
Dynamic resource allocation using virtual machines for cloud computing enviro...
IEEEFINALYEARPROJECTS
 

Similar to Deploying Models (20)

Welcome to International Journal of Engineering Research and Development (IJERD)
Welcome to International Journal of Engineering Research and Development (IJERD)Welcome to International Journal of Engineering Research and Development (IJERD)
Welcome to International Journal of Engineering Research and Development (IJERD)
 
Tour de France Azure PaaS 2/7 Exécuter une application
Tour de France Azure PaaS 2/7 Exécuter une applicationTour de France Azure PaaS 2/7 Exécuter une application
Tour de France Azure PaaS 2/7 Exécuter une application
 
cloud services and providers
cloud services and providerscloud services and providers
cloud services and providers
 
MLOps and Reproducible ML on AWS with Kubeflow and SageMaker
MLOps and Reproducible ML on AWS with Kubeflow and SageMakerMLOps and Reproducible ML on AWS with Kubeflow and SageMaker
MLOps and Reproducible ML on AWS with Kubeflow and SageMaker
 
Estimating the Total Costs of Your Cloud Analytics Platform
Estimating the Total Costs of Your Cloud Analytics PlatformEstimating the Total Costs of Your Cloud Analytics Platform
Estimating the Total Costs of Your Cloud Analytics Platform
 
A Survey on Heuristic Based Techniques in Cloud Computing
A Survey on Heuristic Based Techniques in Cloud ComputingA Survey on Heuristic Based Techniques in Cloud Computing
A Survey on Heuristic Based Techniques in Cloud Computing
 
MLops on Vertex AI Presentation (AI/ML).pptx
MLops on Vertex AI Presentation (AI/ML).pptxMLops on Vertex AI Presentation (AI/ML).pptx
MLops on Vertex AI Presentation (AI/ML).pptx
 
Modern Architecture in the Cloud of 2018
Modern Architecture in the Cloud of 2018Modern Architecture in the Cloud of 2018
Modern Architecture in the Cloud of 2018
 
201908 Overview of Automated ML
201908 Overview of Automated ML201908 Overview of Automated ML
201908 Overview of Automated ML
 
Unleashing Apache Kafka and TensorFlow in the Cloud

Unleashing Apache Kafka and TensorFlow in the Cloud
Unleashing Apache Kafka and TensorFlow in the Cloud

Unleashing Apache Kafka and TensorFlow in the Cloud

 
Hyf azure ml_1
Hyf azure ml_1Hyf azure ml_1
Hyf azure ml_1
 
Cloud computing
Cloud computing Cloud computing
Cloud computing
 
AWS ML Model Deployment
AWS ML Model DeploymentAWS ML Model Deployment
AWS ML Model Deployment
 
Big Data Adavnced Analytics on Microsoft Azure
Big Data Adavnced Analytics on Microsoft AzureBig Data Adavnced Analytics on Microsoft Azure
Big Data Adavnced Analytics on Microsoft Azure
 
Cloud-Native DevOps: Simplifying application lifecycle management with AWS | ...
Cloud-Native DevOps: Simplifying application lifecycle management with AWS | ...Cloud-Native DevOps: Simplifying application lifecycle management with AWS | ...
Cloud-Native DevOps: Simplifying application lifecycle management with AWS | ...
 
Serverless machine learning architectures at Helixa
Serverless machine learning architectures at HelixaServerless machine learning architectures at Helixa
Serverless machine learning architectures at Helixa
 
Microservice message routing on Kubernetes
Microservice message routing on KubernetesMicroservice message routing on Kubernetes
Microservice message routing on Kubernetes
 
DevOps for Machine Learning overview en-us
DevOps for Machine Learning overview en-usDevOps for Machine Learning overview en-us
DevOps for Machine Learning overview en-us
 
Platforming the Major Analytic Use Cases for Modern Engineering
Platforming the Major Analytic Use Cases for Modern EngineeringPlatforming the Major Analytic Use Cases for Modern Engineering
Platforming the Major Analytic Use Cases for Modern Engineering
 
Dynamic resource allocation using virtual machines for cloud computing enviro...
Dynamic resource allocation using virtual machines for cloud computing enviro...Dynamic resource allocation using virtual machines for cloud computing enviro...
Dynamic resource allocation using virtual machines for cloud computing enviro...
 

More from Eng Teong Cheah

Efficiently Removing Duplicates from a Sorted Array
Efficiently Removing Duplicates from a Sorted ArrayEfficiently Removing Duplicates from a Sorted Array
Efficiently Removing Duplicates from a Sorted Array
Eng Teong Cheah
 
Monitoring Models
Monitoring ModelsMonitoring Models
Monitoring Models
Eng Teong Cheah
 
Responsible Machine Learning
Responsible Machine LearningResponsible Machine Learning
Responsible Machine Learning
Eng Teong Cheah
 
Training Optimal Models
Training Optimal ModelsTraining Optimal Models
Training Optimal Models
Eng Teong Cheah
 
Machine Learning Workflows
Machine Learning WorkflowsMachine Learning Workflows
Machine Learning Workflows
Eng Teong Cheah
 
Working with Compute
Working with ComputeWorking with Compute
Working with Compute
Eng Teong Cheah
 
Working with Data
Working with DataWorking with Data
Working with Data
Eng Teong Cheah
 
Experiments & TrainingModels
Experiments & TrainingModelsExperiments & TrainingModels
Experiments & TrainingModels
Eng Teong Cheah
 
Automated Machine Learning
Automated Machine LearningAutomated Machine Learning
Automated Machine Learning
Eng Teong Cheah
 
Getting Started with Azure Machine Learning
Getting Started with Azure Machine LearningGetting Started with Azure Machine Learning
Getting Started with Azure Machine Learning
Eng Teong Cheah
 
Hacking Containers - Container Storage
Hacking Containers - Container StorageHacking Containers - Container Storage
Hacking Containers - Container Storage
Eng Teong Cheah
 
Hacking Containers - Looking at Cgroups
Hacking Containers - Looking at CgroupsHacking Containers - Looking at Cgroups
Hacking Containers - Looking at Cgroups
Eng Teong Cheah
 
Hacking Containers - Linux Containers
Hacking Containers - Linux ContainersHacking Containers - Linux Containers
Hacking Containers - Linux Containers
Eng Teong Cheah
 
Data Security - Storage Security
Data Security - Storage SecurityData Security - Storage Security
Data Security - Storage Security
Eng Teong Cheah
 
Application Security- App security
Application Security- App securityApplication Security- App security
Application Security- App security
Eng Teong Cheah
 
Application Security - Key Vault
Application Security - Key VaultApplication Security - Key Vault
Application Security - Key Vault
Eng Teong Cheah
 
Compute Security - Container Security
Compute Security - Container SecurityCompute Security - Container Security
Compute Security - Container Security
Eng Teong Cheah
 
Compute Security - Host Security
Compute Security - Host SecurityCompute Security - Host Security
Compute Security - Host Security
Eng Teong Cheah
 
Virtual Networking Security - Network Security
Virtual Networking Security - Network SecurityVirtual Networking Security - Network Security
Virtual Networking Security - Network Security
Eng Teong Cheah
 
Virtual Networking Security - Perimeter Security
Virtual Networking Security - Perimeter SecurityVirtual Networking Security - Perimeter Security
Virtual Networking Security - Perimeter Security
Eng Teong Cheah
 

More from Eng Teong Cheah (20)

Efficiently Removing Duplicates from a Sorted Array
Efficiently Removing Duplicates from a Sorted ArrayEfficiently Removing Duplicates from a Sorted Array
Efficiently Removing Duplicates from a Sorted Array
 
Monitoring Models
Monitoring ModelsMonitoring Models
Monitoring Models
 
Responsible Machine Learning
Responsible Machine LearningResponsible Machine Learning
Responsible Machine Learning
 
Training Optimal Models
Training Optimal ModelsTraining Optimal Models
Training Optimal Models
 
Machine Learning Workflows
Machine Learning WorkflowsMachine Learning Workflows
Machine Learning Workflows
 
Working with Compute
Working with ComputeWorking with Compute
Working with Compute
 
Working with Data
Working with DataWorking with Data
Working with Data
 
Experiments & TrainingModels
Experiments & TrainingModelsExperiments & TrainingModels
Experiments & TrainingModels
 
Automated Machine Learning
Automated Machine LearningAutomated Machine Learning
Automated Machine Learning
 
Getting Started with Azure Machine Learning
Getting Started with Azure Machine LearningGetting Started with Azure Machine Learning
Getting Started with Azure Machine Learning
 
Hacking Containers - Container Storage
Hacking Containers - Container StorageHacking Containers - Container Storage
Hacking Containers - Container Storage
 
Hacking Containers - Looking at Cgroups
Hacking Containers - Looking at CgroupsHacking Containers - Looking at Cgroups
Hacking Containers - Looking at Cgroups
 
Hacking Containers - Linux Containers
Hacking Containers - Linux ContainersHacking Containers - Linux Containers
Hacking Containers - Linux Containers
 
Data Security - Storage Security
Data Security - Storage SecurityData Security - Storage Security
Data Security - Storage Security
 
Application Security- App security
Application Security- App securityApplication Security- App security
Application Security- App security
 
Application Security - Key Vault
Application Security - Key VaultApplication Security - Key Vault
Application Security - Key Vault
 
Compute Security - Container Security
Compute Security - Container SecurityCompute Security - Container Security
Compute Security - Container Security
 
Compute Security - Host Security
Compute Security - Host SecurityCompute Security - Host Security
Compute Security - Host Security
 
Virtual Networking Security - Network Security
Virtual Networking Security - Network SecurityVirtual Networking Security - Network Security
Virtual Networking Security - Network Security
 
Virtual Networking Security - Perimeter Security
Virtual Networking Security - Perimeter SecurityVirtual Networking Security - Perimeter Security
Virtual Networking Security - Perimeter Security
 

Recently uploaded

GraphRAG is All You need? LLM & Knowledge Graph
GraphRAG is All You need? LLM & Knowledge GraphGraphRAG is All You need? LLM & Knowledge Graph
GraphRAG is All You need? LLM & Knowledge Graph
Guy Korland
 
To Graph or Not to Graph Knowledge Graph Architectures and LLMs
To Graph or Not to Graph Knowledge Graph Architectures and LLMsTo Graph or Not to Graph Knowledge Graph Architectures and LLMs
To Graph or Not to Graph Knowledge Graph Architectures and LLMs
Paul Groth
 
Assuring Contact Center Experiences for Your Customers With ThousandEyes
Assuring Contact Center Experiences for Your Customers With ThousandEyesAssuring Contact Center Experiences for Your Customers With ThousandEyes
Assuring Contact Center Experiences for Your Customers With ThousandEyes
ThousandEyes
 
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
UiPathCommunity
 
DevOps and Testing slides at DASA Connect
DevOps and Testing slides at DASA ConnectDevOps and Testing slides at DASA Connect
DevOps and Testing slides at DASA Connect
Kari Kakkonen
 
UiPath Test Automation using UiPath Test Suite series, part 3
UiPath Test Automation using UiPath Test Suite series, part 3UiPath Test Automation using UiPath Test Suite series, part 3
UiPath Test Automation using UiPath Test Suite series, part 3
DianaGray10
 
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024
Tobias Schneck
 
Key Trends Shaping the Future of Infrastructure.pdf
Key Trends Shaping the Future of Infrastructure.pdfKey Trends Shaping the Future of Infrastructure.pdf
Key Trends Shaping the Future of Infrastructure.pdf
Cheryl Hung
 
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered Quality
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered QualitySoftware Delivery At the Speed of AI: Inflectra Invests In AI-Powered Quality
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered Quality
Inflectra
 
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdfFIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
FIDO Alliance
 
UiPath Test Automation using UiPath Test Suite series, part 4
UiPath Test Automation using UiPath Test Suite series, part 4UiPath Test Automation using UiPath Test Suite series, part 4
UiPath Test Automation using UiPath Test Suite series, part 4
DianaGray10
 
Search and Society: Reimagining Information Access for Radical Futures
Search and Society: Reimagining Information Access for Radical FuturesSearch and Society: Reimagining Information Access for Radical Futures
Search and Society: Reimagining Information Access for Radical Futures
Bhaskar Mitra
 
PHP Frameworks: I want to break free (IPC Berlin 2024)
PHP Frameworks: I want to break free (IPC Berlin 2024)PHP Frameworks: I want to break free (IPC Berlin 2024)
PHP Frameworks: I want to break free (IPC Berlin 2024)
Ralf Eggert
 
AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...
AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...
AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...
Product School
 
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
DanBrown980551
 
Bits & Pixels using AI for Good.........
Bits & Pixels using AI for Good.........Bits & Pixels using AI for Good.........
Bits & Pixels using AI for Good.........
Alison B. Lowndes
 
FIDO Alliance Osaka Seminar: Overview.pdf
FIDO Alliance Osaka Seminar: Overview.pdfFIDO Alliance Osaka Seminar: Overview.pdf
FIDO Alliance Osaka Seminar: Overview.pdf
FIDO Alliance
 
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
Product School
 
IOS-PENTESTING-BEGINNERS-PRACTICAL-GUIDE-.pptx
IOS-PENTESTING-BEGINNERS-PRACTICAL-GUIDE-.pptxIOS-PENTESTING-BEGINNERS-PRACTICAL-GUIDE-.pptx
IOS-PENTESTING-BEGINNERS-PRACTICAL-GUIDE-.pptx
Abida Shariff
 
Epistemic Interaction - tuning interfaces to provide information for AI support
Epistemic Interaction - tuning interfaces to provide information for AI supportEpistemic Interaction - tuning interfaces to provide information for AI support
Epistemic Interaction - tuning interfaces to provide information for AI support
Alan Dix
 

Recently uploaded (20)

GraphRAG is All You need? LLM & Knowledge Graph
GraphRAG is All You need? LLM & Knowledge GraphGraphRAG is All You need? LLM & Knowledge Graph
GraphRAG is All You need? LLM & Knowledge Graph
 
To Graph or Not to Graph Knowledge Graph Architectures and LLMs
To Graph or Not to Graph Knowledge Graph Architectures and LLMsTo Graph or Not to Graph Knowledge Graph Architectures and LLMs
To Graph or Not to Graph Knowledge Graph Architectures and LLMs
 
Assuring Contact Center Experiences for Your Customers With ThousandEyes
Assuring Contact Center Experiences for Your Customers With ThousandEyesAssuring Contact Center Experiences for Your Customers With ThousandEyes
Assuring Contact Center Experiences for Your Customers With ThousandEyes
 
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
 
DevOps and Testing slides at DASA Connect
DevOps and Testing slides at DASA ConnectDevOps and Testing slides at DASA Connect
DevOps and Testing slides at DASA Connect
 
UiPath Test Automation using UiPath Test Suite series, part 3
UiPath Test Automation using UiPath Test Suite series, part 3UiPath Test Automation using UiPath Test Suite series, part 3
UiPath Test Automation using UiPath Test Suite series, part 3
 
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024
 
Key Trends Shaping the Future of Infrastructure.pdf
Key Trends Shaping the Future of Infrastructure.pdfKey Trends Shaping the Future of Infrastructure.pdf
Key Trends Shaping the Future of Infrastructure.pdf
 
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered Quality
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered QualitySoftware Delivery At the Speed of AI: Inflectra Invests In AI-Powered Quality
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered Quality
 
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdfFIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
 
UiPath Test Automation using UiPath Test Suite series, part 4
UiPath Test Automation using UiPath Test Suite series, part 4UiPath Test Automation using UiPath Test Suite series, part 4
UiPath Test Automation using UiPath Test Suite series, part 4
 
Search and Society: Reimagining Information Access for Radical Futures
Search and Society: Reimagining Information Access for Radical FuturesSearch and Society: Reimagining Information Access for Radical Futures
Search and Society: Reimagining Information Access for Radical Futures
 
PHP Frameworks: I want to break free (IPC Berlin 2024)
PHP Frameworks: I want to break free (IPC Berlin 2024)PHP Frameworks: I want to break free (IPC Berlin 2024)
PHP Frameworks: I want to break free (IPC Berlin 2024)
 
AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...
AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...
AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...
 
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
 
Bits & Pixels using AI for Good.........
Bits & Pixels using AI for Good.........Bits & Pixels using AI for Good.........
Bits & Pixels using AI for Good.........
 
FIDO Alliance Osaka Seminar: Overview.pdf
FIDO Alliance Osaka Seminar: Overview.pdfFIDO Alliance Osaka Seminar: Overview.pdf
FIDO Alliance Osaka Seminar: Overview.pdf
 
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
 
IOS-PENTESTING-BEGINNERS-PRACTICAL-GUIDE-.pptx
IOS-PENTESTING-BEGINNERS-PRACTICAL-GUIDE-.pptxIOS-PENTESTING-BEGINNERS-PRACTICAL-GUIDE-.pptx
IOS-PENTESTING-BEGINNERS-PRACTICAL-GUIDE-.pptx
 
Epistemic Interaction - tuning interfaces to provide information for AI support
Epistemic Interaction - tuning interfaces to provide information for AI supportEpistemic Interaction - tuning interfaces to provide information for AI support
Epistemic Interaction - tuning interfaces to provide information for AI support
 

Deploying Models

  • 3. Inferencing? In machine learning, inferencing refers to the use of a trained model to predict labels for new data on which the model has not been trained. Often, the model is deployed as part of a service that enables applications to request immediate, or real-time, predictions for individual, or small numbers of data observations. In Azure Machine Learning, you can create real-time inferencing solutions by deploying a model as a service, hosted in a containerized platform, such as Azure Kubernetes Services (AKS).
  • 4. Deploying a Real-Time Inferencing Service
  • 5. Machine learning inference during deployment When deploying your AI model during production, you need to consider how it will make predictions. The two main processes for AI models are: •Batch inference: An asynchronous process that bases its predictions on a batch of observations. The predictions are stored as files or in a database for end users or business applications. •Real-time (or interactive) inference: Frees the model to make predictions at any time and trigger an immediate response. This pattern can be used to analyze streaming and interactive application data.
  • 6. Machine learning inference during deployment Consider the following questions to evaluate your model, compare the two processes, and select the one that suits your model: •How often should predictions be generated? •How soon are the results needed? •Should predictions be generated individually, in small batches, or in large batches? •Is latency to be expected from the model? •How much compute power is needed to execute the model? •Are there operational implications and costs to maintain the model?
  • 7. Batch inference Batch inference, sometimes called offline inference, is a simpler inference process that helps models to run in timed intervals and business applications to store predictions. Consider the following best practices for batch inference: •Trigger batch scoring: Use Azure Machine Learning pipelines and the ParallelRunStep feature in Azure Machine Learning to set up a schedule or event- based automation. •Compute options for batch inference: Since batch inference processes don't run continuously, it's recommended to automatically start, stop, and scale reusable clusters that can handle a range of workloads. Different models require different environments, and your solution needs to be able to deploy a specific environment and remove it when inference is over for the compute to be available for the next model.
  • 8. Real-time inference Real-time, or interactive, inference is architecture where model inference can be triggered at any time, and an immediate response is expected. This pattern can be used to analyze streaming data, interactive application data, and more. This mode allows you to take advantage of your machine learning model in real time and resolves the cold-start problem outlined above in batch inference. The following considerations and best practices are available if real-time inference is right for your model: •The challenges of real-time inference: Latency and performance requirements make real-time inference architecture more complex for your model. A system might need to respond in 100 milliseconds or less, during which it needs to retrieve the data, perform inference, validate and store the model results, run any required business logic, and return the results to the system or application.
  • 9. Real-time inference •Compute options for real-time inference: The best way to implement real-time inference is to deploy the model in a container form to Docker or Azure Kubernetes Service (AKS) cluster and expose it as a web service with a REST API. This way, the model runs in its own isolated environment and can be managed like any other web service. Docker and AKS capabilities can then be used for management, monitoring, scaling, and more. The model can be deployed on-premises, in the cloud, or on the edge. The preceding compute decision outlines real-time inference.
  • 10. Real-time inference •Multiregional deployment and high availability: Regional deployment and high availability architectures need to be considered in real-time inference scenarios, as latency and the model's performance will be critical to resolve. To reduce latency in multiregional deployments, it's recommended to locate the model as close as possible to the consumption point. The model and supporting infrastructure should follow the business' high availability and DR principles and strategy.
  • 11. Create a real-time inference service https://ceteongvanness.wordpress.com/2022/11/01/ create-a-real-time-inference-service/