Oleksii Ivanchenko: FMOps/LLMOps: особливості підходу і чим відрізняється від MLOps (UA)

© 2024, Amazon Web Services, Inc. or its affiliates. All rights reserved.
FMOps/LLMOps: Operationalise Generative AI
Oleksii Ivanchenko
Solutions Architect from AWS

MLOps Recap
FMOps & LLMOps intro
FMOps per Personas
Agenda

What is MLOps?
MLOps Definition
People
Processes
Technology
The combination of people,
processes, and technology to
productionize ML solutions
efficiently.
MLOps
Machine Learning
& Operations

MLOps Foundation Expected Outcomes
S T A N D A R D I Z E O P E R A T I O N S A N D I N F R A S T R U C T U R E F O R Y O U R D A T A S C I E N C E
Business Goal Technical Metric Before MLOps
MLOps Expected
Outcomes
Business Value
1 Be more efficient in delivery
Time to value
(from idea to production)
up to
12 months
< 3 months
Improve Speed-to-Value
by 4x
2 Simplify route-to-live
Time to productionize existing ML use
cases
3-6 months < 2 weeks
Reduce FTE overhead
in average 8x
3
Standardize infrastructure, data,
& code
% Template driven development n/a > 85%
Focus on innovation
increasing re-usability
by 85%
4
Standardize onboarding of new
teams and ML use cases
Time to instantiate a new MLOps
infrastructure & ML projects
40 days < 1 hours
Accelerate ML adoption
across all business areas
5 Ensure high security standards
Execute the ML solutions without
internet access in a private cloud
n/a No internet
Your data is safe in your
private cloud
Reduce platform, people and operation costs
Customer references building MLOps foundation and business benefits:
• NatWest: https://aws.amazon.com/solutions/case-studies/natwest-group-case-study
• BP: https://aws.amazon.com/solutions/case-studies/bp-machine-learning-case-study

MLOps Foundation People & Processes
S E P A R A T I O N O F C O N C E R N S I S K E Y F O R S U C C E S S
Experimentation
Data
ML Governance
Platform
Administration
Model
Approvers
Auditors &
Compliance
Product Owner
Lead Data Scientists
Data Scientists
Data Owners
MLOps
Engineers
System
Administrators
Security
Data Engineers Business Stakeholders
ML Consumers
Model Build Model Test
Model
Deployment
Data Scientists ML Engineers
ML
Production
Environment
Provision infrastructure
Provide user access
Provide data access
Ingest data
Prepare, combine and catalogue Data
Visualize data
Prove that ML can solve a business
problem, i.e. PoC
Automate model build/training
providing scaled data
Automate model testing and
guardrails
Serving and monitoring the model
testing
Centralized model and code artifact
storage/versioning/auditing

MLOPs Scalable Phase
M U L T I P L E T E A M S A N D M L U S E C A S E S A D O P T M L O P S

Generative AI & MLOps
MLOps & FMOps/LLMOps Differentiators

People
Processes
Technology
Key Definitions
MLOps FMOps
Foundation Model Operations
Productionize Generative AI
Solutions (Text-Text/ Image/
Video/ Audio/ …)
Machine Learning Operations
Productionize ML solutions
efficiently
LLMOps
Large Language Model Operations
Productionize Large Language
Model-based solutions

Differences between MLOps and FMOps
People
Processes
Technology
MLOps FMOps
Processes & People
Providers, fine-tuners, & consumers
Select & Customize the FM on a Specific Context
- Fine-tuning, parameter-efficient fine-tuning, prompt engineering
- Proprietary, open source based on the application
Evaluate & Monitor Fine-tuned Models
Human feedback, prompt management, toxicity/bias…
Data & Model Deployment
Data privacy, multi-tenancy, & cost, latency, and precision
Technology
MLOps, data, & application layers

MLOps & FMOps Differentiators
People
Processes
Technology
MLOps FMOps Processes & People
Providers, fine-tuners, & consumers
Select & Customize the FM on a Specific Context
- Fine-tuning, parameter-efficient fine-tuning, prompt engineering
- Proprietary, open source based on the application
Evaluate & Monitor Fine-tuned Models
Human feedback, prompt management, toxicity/bias…
Data & Model Deployment
Data privacy, multi-tenancy, & cost, latency, and precision
Technology
MLOps, data, & application layers

Generative AI User Types & Skills
11
Generative AI
User Types
Skills
No ML expertise required. Focus
on prompt engineering and
retrieval augmented generation.
Strong end-to-end ML expertise and
domain knowledge for tuning
including prompt engineering.
Deep end-to-end ML, NLP
expertise and data science,
labeler “squad“.
Providers
Entities who pre-train FMs
from scratch themselves and
provide them as a product to
fine-tuner and consumer.
Fine-Tuners
Customize (e.g. fine-tune)
providers‘ pre-trained FMs
with their own data and run
inference, while provide access
to consumers.
Consumers
Interact with Generative AI
services from provider or fine-
tuner by text prompting or
visual interface to complete
desired actions.
Productionize large models
leveraging ML & Operations (FMOps)
Productionize applications
leveraging Generative AI &
Operations
Can become Can become

The Journey of Consumers

Generative AI Processes – Consumers
13
Select, evaluate, & use FM as
a black-box & adapt context
Using multiple chained models and
prompt engineering techniques to achieve
context adaptation (if necessary). Expose
the solution to the end users
Inputs/Outputs & Rating
Interaction with the Generative AI
Solutions. Aim to improve outcomes
by penalizing or rewarding Generative
AI solution outputs providing insights
for prompt engineering

Select FM - Consumers
14
20
Step 1. Understand
top proprietary and
open source FM
capabilities
Step 2. Test & evaluate the
top selected FMs (e.g. top 3)
Step 3. Select the best
FM based on your
priorities
>20 K
available
FMs
1
Quick short listing:
Use a small set of test
prompts based on the
task
Use case-based benchmarking:
Evaluate the models based on
predefined prompts and outputs
(prompt catalog)
Priority-based decision:
Select based on business
priorities cost, latency, precision
3

Step 1. Understand top FM capabilities
15
Proprietary or open-source FM
Commercial License
# of Parameters
Quality metrics
Speed Context Window
Trained on
(instructions, code, internet data)
Fine-tunable
Existing Customer Skills
Main FM Capability Matrix

Step 2. Evaluate the top FMs
19
Does labelled data
exist?
Does the use case
provide discrete
outcomes/outputs?
Accuracy metrics
High evaluation precision but not
many use cases
Task-specific metrics
Medium evaluation precision that
might require human evaluation but
many use cases
Human in the Loop
(HIL)
High evaluation precision but costly
and time consuming
LLM
Unknown evaluation precision
(depend on the LLM) but automated
Should we automate
the test process?
Yes
Classic ML metrics e.g. precision,
…
Similarity: ROUGE, cosine similarity [0,1], …
Prompt Stereotyping: Cross-entropy loss
Toxicity: Detoxify, Toxigen
Factual Knowledge: HELM, LAMA Benchmark
Semantic Robustness: Word error rate
No Manually human feedback
based on predefined
assessment rules e.g. using
GroundTruth
Feed the outputs to a ”reliable”
LLM and instruct it to rate the
outcome with an score and
explanation
Yes
No
Yes
No

Step 3. Select the best FM based on priorities
21
Precision
Speed
Cost
High speed, smaller model,
lower precision, smaller cost
Lower speed, larger model,
higher precision, larger cost
P0: lower cost
No priority
P1: Precision
Model Evaluation Score HIL/LLM Feedback
FM1 5/5 <Feedback summary>
Model Cost
FM1 $$$$
FM2 $
FM3 $$$
Model Speed
FM1 ⚡⚡
FM2 ⚡
FM3 ⚡
Model Selection:
FM2
EXAMPLE

© 2024, Amazon Web Services, Inc. or its aﬃliates. All rights reserved.
1. Select FM
2. Prompt
Engineering
3. Test &
Test Prompt
lineage
(input & outputs)
Develop &
Deploy Web
Application
New Test Set
Generative AI Processes for LLM – Consumers
22
5. Input/
Output
Filtering &
Guardrails
6. Rating
Mechanisms
(thumbs up/down,
rating, text)
Backend Front-end
Test
Functionality
Generative AI End-
users
Use web application
rate the quality of
output
Generative AI Developers & Prompt Engineers/Testers
LLM-based
Generative
AI
Solution
Input/ Output
& Rating
Interaction
DevOps/AppDevs
WebUI
4. Chain
Prompts &
Applications

Develop &
Deploy Web
Application
New Test Set
Generative AI Processes for LLM – Consumers
23
Backend Front-end
Test
Functionality
Generative AI End-
users
Use web application
rate the quality of
output
Generative AI Developers & Prompt Engineers/Testers
LLM-based
Generative
AI
Solution
Input/ Output
& Rating
Interaction
DevOps/AppDevs
WebUI
Create User
Profile & Share
Data
Fine-tune Personalized Models
1. Select FM
2. Prompt
Engineering
Test & Test
Prompt
lineage
(input & outputs)
4. Input/
Output
Filtering
5. Rating
Mechanisms
(thumbs up/down,
rating, text)
3. Chain
Prompts &
Applications
Fine-tune FM using APIs
1.Select FM
or Use Fine-
tuned Mode
2. Prompt
Engineering
3. Test &
Test Prompt
lineage
(input & outputs)
5. Input/
Output
Filtering &
Guardrails
6. Rating
Mechanisms
(thumbs up/down,
rating, text)
4. Chain
Prompts &
Applications

The Journey of Fine-tuners

Generative AI Processes - Fine-Tuners
27
Data Labeling
Human in the loop to label
thousands, or hundreds of
data
Fine-tune
Customization for
specific domains
Deployment &
Prompt Engineering
Trade-off between cost,
precision, & latency
Chain of models
Prompt designing and
engineering
Filtering input prompts and
results using embeddings
Monitor
Human in the loop
feedback/rating, result
similarities, toxicity rate,
new methods under
research

Fine-Tuning, PEFT & Training
Thousands of example
input/outputs
Pre-trained FM
Tens of example
input/outputs
Parameter-Eﬃcient Fine-
Tuning (PEFT - Training Job)
Requires low computation power (reduced cost e.g. 1/10 of
fine-tuning) as it adds small new layers in the Large FM
Lower accuracy
Fine-Tuning
(Training Job)
Requires high computation power (high cost but lower
than training) to calculate all the weights of Large FM
Higher accuracy
Fine-Tuned FM
Prompt
(Inputs)
Completion
(Outputs)
Select
the
‘right’
model

Generative AI SageMaker Capabilities
29
Amazon SageMaker
Gen AI for Builders
Data Scientists and ML Engineers
Pre-train, customize, and deploy
FMs with fine-grain controls for
generative AI use cases that have
stringent requirements on
accuracy, latency, and cost.
Deploy
Deploy the model
with SageMaker
Inference and run
inference
for your generative
AI use case
Browse
Browse 730+ public
and 20+ proprietary
FMs on SageMaker
Jumpstart
Evaluate
Evaluate multiple FMs
with SageMaker Clarify
Model Evaluation to
select the right one for
your use case
Customize
Customize with
SageMaker Training
using your own
dataset.
Use SageMaker HIL
for data collection
EXPERIMENT
Iteratively customize and evaluate models with
SageMaker Experiments
Industrialize with MLOps and Monitoring
Automate model selection, evaluation and
deployment with SageMaker Pipelines

LLMOps Lifecycle
E V A L U A T I O N & F I N E - T U N I N G M L P I P E L I N E
Pre-
processing
Model
Comparison &
Selection
based on Cost,
Latency, etc.
Amazon SageMaker Pipeline
Evaluation LLM Pipeline
LLM 1
Evaluation
Evaluation
Fine-tuning
(Training Job)
Custom LLM 3
LLM 2
Evaluation
Model
Endpoint
Version
Approval &
Deployment
Monitoring
LLM fine-tuning, evaluation,
and selection at scale
LLM versioning,
lineage, approval
LLM serving,
monitoring, rating
Amazon EventBridge
Monitoring Alerts
Alert evaluation &
approval for re-fine-
tuning and evaluation
Input/Labelled
data
SageMaker Clarify
Model Evaluation
Model
Registration
V1 V2
Model Version
Model Registry
LLM artifacts
User
Interaction
via API
Generative AI
Application
Prompt prep.
Model triggering
Guardrails
Generative AI
User
Prompt Catalog
HIL Data
Labelers
& Prompt
Testers
LLM Output
Rating
Store user prompts to prompt
catalog for future automatic
evaluation (Improve testing
coverage)
Prompt
Translator
Generative AI app &
user interaction
Operationalize LLM Evaluation at Scale using
Amazon SageMaker Clarify and MLOps services

MLOPs &
Generative AI
Technology –
Fine-tuner
T H R E E M A I N L A Y E R S A R E
I N T E R C O N N E C T E D
Generative
AI
Application
MLOps
Data
Data
Generative AI Application
MLOps

MLOPs &
Generative AI
Technology –
Fine-tuner
T H R E E M A I N L A Y E R S A R E
I N T E R C O N N E C T E D
1
2
4
3
5
Amazon
Bedrock

FMOps Foundation People & Processes
Experimentation
Data
ML Governance
Platform
Administration
Model
Approvers
Auditors &
Compliance
Product Owner
Data Scientists
Data Owners
MLOps
Engineers
System
Administrators
Security
ML Consumers
Model
Deployment
ML
Production
Environment
Provide user access
Provide data access
Ingest data
Visualize data
problem, i.e. PoC
guardrails
testing

FMOps Foundation People & Processes
Experimentation
Data
ML Governance
Platform
Administration
Model
Approvers
Auditors &
Compliance
Product Owner
Data Scientists
Data Owners
MLOps
Engineers
System
Administrators
Security
ML Consumers
Model
Deployment
ML
Production
Environment
Provide user access
Provide data access
Ingest data
Visualize data
problem, i.e. PoC
guardrails
testing
Application Generative AI
Developers
AppDev
Prompt
Engineers /Testers
Generative AI
End-users
Fine-Tuners
Fine-Tuners
Web UI
Data
Labelers / Editors
Web UI

The Journey of Providers

Generative AI Processes - Providers
P R E - T R A I N I N G A N F M I S N O T A T R I V I A L T A S K
36
Massive Data
Labeling
Human in the loop to label
trillions of data
Weeks to Months
of Training
Distribute the train of the
Foundation Models
Deployment
Deploy proprietary FMs or
Open Source the model artifact
and code
Leaderboards
FM are being evaluated by
third party organizations

Providers Productionize FM using MLOps
T R A I N M U L T I P L E F O U N D A T I O N S M O D E L S

Building FM using Amazon SageMaker HyperPod
D E V E L O P A N D T R A I N F M S C O N T I N U O U S L Y F O R W E E K S A N D M O N T H S
Self-healing clusters reduce
training time by up to 20%
Resilient
environment
SageMaker distributed
training libraries improve
performance by up to 20%
Streamline
distributed training
Control over computing
environment and workload
scheduling
Optimized
resources utilization
38

Conclusion & Resources

Consumers
Providers
Fine-Tuners
Generative AI Stack on AWS
A L I G N E D T O Y O U R M A T U R I T Y
End-users

GPUs Inferentia
Trainium SageMaker
EC2 Capacity Blocks Neuron
UltraClusters EFA Nitro
Amazon Bedrock
Guardrails Agents Customization Capabilities
Amazon Q Amazon Q in
Amazon QuickSight
Amazon Q in
Amazon Connect
Amazon
CodeWhisperer
Providers
Fine-Tuners
Generative AI Stack on AWS
A L I G N E D T O Y O U R M A T U R I T Y
HyperPod
Consumers
End-users

Generative AI & Operations Resources
Generative AI for Builders on Amazon SageMaker Workshop
http://tinyurl.com/aws-genai-for-builders
FMOps/LLMOps: Operationalize generative AI and differences with MLOps
https://aws.amazon.com/blogs/machine-learning/fmops-llmops-operationalize-generative-ai-and-differences-with-mlops
Operationalize LLM Evaluation at Scale using Amazon SageMaker Clarify and MLOps services
https://aws.amazon.com/blogs/machine-learning/operationalize-llm-evaluation-at-scale-using-amazon-sagemaker-clarify-and-mlops-services
Build an internal SaaS service with cost and usage tracking for foundation models on Amazon Bedrock
https://aws.amazon.com/blogs/machine-learning/build-an-internal-saas-service-with-cost-and-usage-tracking-for-foundation-models-on-amazon-bedrock/

Thank you!
Thank you
Oleksii Ivanchenko
linkedin.com/in/oleksii-ivanchenko/

Oleksii Ivanchenko: FMOps/LLMOps: особливості підходу і чим відрізняється від MLOps (UA)

More Related Content

Similar to Oleksii Ivanchenko: FMOps/LLMOps: особливості підходу і чим відрізняється від MLOps (UA)

More from Lviv Startup Club

Recently uploaded

Oleksii Ivanchenko: FMOps/LLMOps: особливості підходу і чим відрізняється від MLOps (UA)