© 2024, Amazon Web Services, Inc. or its affiliates. All rights reserved.
FMOps/LLMOps: Operationalise Generative AI
Oleksii Ivanchenko
Solutions Architect from AWS
© 2024, Amazon Web Services, Inc. or its affiliates. All rights reserved.
MLOps Recap
FMOps & LLMOps intro
FMOps per Personas
Agenda
© 2024, Amazon Web Services, Inc. or its affiliates. All rights reserved.
What is MLOps?
MLOps Definition
People
Processes
Technology
The combination of people,
processes, and technology to
productionize ML solutions
efficiently.
MLOps
Machine Learning
& Operations
© 2024, Amazon Web Services, Inc. or its affiliates. All rights reserved.
MLOps Foundation Expected Outcomes
S T A N D A R D I Z E O P E R A T I O N S A N D I N F R A S T R U C T U R E F O R Y O U R D A T A S C I E N C E
Business Goal Technical Metric Before MLOps
MLOps Expected
Outcomes
Business Value
1 Be more efficient in delivery
Time to value
(from idea to production)
up to
12 months
< 3 months
Improve Speed-to-Value
by 4x
2 Simplify route-to-live
Time to productionize existing ML use
cases
3-6 months < 2 weeks
Reduce FTE overhead
in average 8x
3
Standardize infrastructure, data,
& code
% Template driven development n/a > 85%
Focus on innovation
increasing re-usability
by 85%
4
Standardize onboarding of new
teams and ML use cases
Time to instantiate a new MLOps
infrastructure & ML projects
40 days < 1 hours
Accelerate ML adoption
across all business areas
5 Ensure high security standards
Execute the ML solutions without
internet access in a private cloud
n/a No internet
Your data is safe in your
private cloud
Reduce platform, people and operation costs
Customer references building MLOps foundation and business benefits:
• NatWest: https://aws.amazon.com/solutions/case-studies/natwest-group-case-study
• BP: https://aws.amazon.com/solutions/case-studies/bp-machine-learning-case-study
© 2024, Amazon Web Services, Inc. or its affiliates. All rights reserved.
MLOps Foundation People & Processes
S E P A R A T I O N O F C O N C E R N S I S K E Y F O R S U C C E S S
Experimentation
Data
ML Governance
Platform
Administration
Model
Approvers
Auditors &
Compliance
Product Owner
Lead Data Scientists
Data Scientists
Data Owners
MLOps
Engineers
System
Administrators
Security
Data Engineers Business Stakeholders
ML Consumers
Model Build Model Test
Model
Deployment
Data Scientists ML Engineers
ML
Production
Environment
Provision infrastructure
Provide user access
Provide data access
Ingest data
Prepare, combine and catalogue Data
Visualize data
Prove that ML can solve a business
problem, i.e. PoC
Automate model build/training
providing scaled data
Automate model testing and
guardrails
Serving and monitoring the model
testing
Centralized model and code artifact
storage/versioning/auditing
© 2024, Amazon Web Services, Inc. or its affiliates. All rights reserved.
MLOPs Scalable Phase
M U L T I P L E T E A M S A N D M L U S E C A S E S A D O P T M L O P S
© 2024, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Generative AI & MLOps
MLOps & FMOps/LLMOps Differentiators
© 2024, Amazon Web Services, Inc. or its affiliates. All rights reserved.
People
Processes
Technology
Key Definitions
MLOps FMOps
Foundation Model Operations
Productionize Generative AI
Solutions (Text-Text/ Image/
Video/ Audio/ …)
Machine Learning Operations
Productionize ML solutions
efficiently
LLMOps
Large Language Model Operations
Productionize Large Language
Model-based solutions
© 2024, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Differences between MLOps and FMOps
People
Processes
Technology
MLOps FMOps
Processes & People
Providers, fine-tuners, & consumers
Select & Customize the FM on a Specific Context
- Fine-tuning, parameter-efficient fine-tuning, prompt engineering
- Proprietary, open source based on the application
Evaluate & Monitor Fine-tuned Models
Human feedback, prompt management, toxicity/bias…
Data & Model Deployment
Data privacy, multi-tenancy, & cost, latency, and precision
Technology
MLOps, data, & application layers
© 2024, Amazon Web Services, Inc. or its affiliates. All rights reserved.
MLOps & FMOps Differentiators
People
Processes
Technology
MLOps FMOps Processes & People
Providers, fine-tuners, & consumers
Select & Customize the FM on a Specific Context
- Fine-tuning, parameter-efficient fine-tuning, prompt engineering
- Proprietary, open source based on the application
Evaluate & Monitor Fine-tuned Models
Human feedback, prompt management, toxicity/bias…
Data & Model Deployment
Data privacy, multi-tenancy, & cost, latency, and precision
Technology
MLOps, data, & application layers
© 2024, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Generative AI User Types & Skills
11
Generative AI
User Types
Skills
No ML expertise required. Focus
on prompt engineering and
retrieval augmented generation.
Strong end-to-end ML expertise and
domain knowledge for tuning
including prompt engineering.
Deep end-to-end ML, NLP
expertise and data science,
labeler “squad“.
Providers
Entities who pre-train FMs
from scratch themselves and
provide them as a product to
fine-tuner and consumer.
Fine-Tuners
Customize (e.g. fine-tune)
providers‘ pre-trained FMs
with their own data and run
inference, while provide access
to consumers.
Consumers
Interact with Generative AI
services from provider or fine-
tuner by text prompting or
visual interface to complete
desired actions.
Productionize large models
leveraging ML & Operations (FMOps)
Productionize applications
leveraging Generative AI &
Operations
Can become Can become
© 2024, Amazon Web Services, Inc. or its affiliates. All rights reserved.
The Journey of Consumers
© 2024, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Generative AI Processes – Consumers
13
Select, evaluate, & use FM as
a black-box & adapt context
Using multiple chained models and
prompt engineering techniques to achieve
context adaptation (if necessary). Expose
the solution to the end users
Inputs/Outputs & Rating
Interaction with the Generative AI
Solutions. Aim to improve outcomes
by penalizing or rewarding Generative
AI solution outputs providing insights
for prompt engineering
© 2024, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Select FM - Consumers
14
20
Step 1. Understand
top proprietary and
open source FM
capabilities
Step 2. Test & evaluate the
top selected FMs (e.g. top 3)
Step 3. Select the best
FM based on your
priorities
>20 K
available
FMs
1
Quick short listing:
Use a small set of test
prompts based on the
task
Use case-based benchmarking:
Evaluate the models based on
predefined prompts and outputs
(prompt catalog)
Priority-based decision:
Select based on business
priorities cost, latency, precision
3
© 2024, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Step 1. Understand top FM capabilities
15
Proprietary or open-source FM
Commercial License
# of Parameters
Quality metrics
Speed Context Window
Trained on
(instructions, code, internet data)
Fine-tunable
Existing Customer Skills
Main FM Capability Matrix
© 2024, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Step 2. Evaluate the top FMs
19
Does labelled data
exist?
Does the use case
provide discrete
outcomes/outputs?
Accuracy metrics
High evaluation precision but not
many use cases
Task-specific metrics
Medium evaluation precision that
might require human evaluation but
many use cases
Human in the Loop
(HIL)
High evaluation precision but costly
and time consuming
LLM
Unknown evaluation precision
(depend on the LLM) but automated
Should we automate
the test process?
Yes
Classic ML metrics e.g. precision,
…
Similarity: ROUGE, cosine similarity [0,1], …
Prompt Stereotyping: Cross-entropy loss
Toxicity: Detoxify, Toxigen
Factual Knowledge: HELM, LAMA Benchmark
Semantic Robustness: Word error rate
No Manually human feedback
based on predefined
assessment rules e.g. using
GroundTruth
Feed the outputs to a ”reliable”
LLM and instruct it to rate the
outcome with an score and
explanation
Yes
No
Yes
No
© 2024, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Step 3. Select the best FM based on priorities
21
Precision
Speed
Cost
High speed, smaller model,
lower precision, smaller cost
Lower speed, larger model,
higher precision, larger cost
P0: lower cost
No priority
P1: Precision
Model Evaluation Score HIL/LLM Feedback
FM1 5/5 <Feedback summary>
FM2 4/5 <Feedback summary>
FM3 3/5 <Feedback summary>
Model Cost
FM1 $$$$
FM2 $
FM3 $$$
Model Speed
FM1 ⚡⚡
FM2 ⚡
FM3 ⚡
Model Selection:
FM2
EXAMPLE
© 2024, Amazon Web Services, Inc. or its affiliates. All rights reserved.
1. Select FM
2. Prompt
Engineering
3. Test &
Test Prompt
lineage
(input & outputs)
Develop &
Deploy Web
Application
New Test Set
Generative AI Processes for LLM – Consumers
22
5. Input/
Output
Filtering &
Guardrails
6. Rating
Mechanisms
(thumbs up/down,
rating, text)
Backend Front-end
Test
Functionality
Generative AI End-
users
Use web application
rate the quality of
output
Generative AI Developers & Prompt Engineers/Testers
LLM-based
Generative
AI
Solution
Input/ Output
& Rating
Interaction
DevOps/AppDevs
WebUI
4. Chain
Prompts &
Applications
© 2024, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Develop &
Deploy Web
Application
New Test Set
Generative AI Processes for LLM – Consumers
23
Backend Front-end
Test
Functionality
Generative AI End-
users
Use web application
rate the quality of
output
Generative AI Developers & Prompt Engineers/Testers
LLM-based
Generative
AI
Solution
Input/ Output
& Rating
Interaction
DevOps/AppDevs
WebUI
Create User
Profile & Share
Data
Fine-tune Personalized Models
1. Select FM
2. Prompt
Engineering
Test & Test
Prompt
lineage
(input & outputs)
4. Input/
Output
Filtering
5. Rating
Mechanisms
(thumbs up/down,
rating, text)
3. Chain
Prompts &
Applications
Fine-tune FM using APIs
1.Select FM
or Use Fine-
tuned Mode
2. Prompt
Engineering
3. Test &
Test Prompt
lineage
(input & outputs)
5. Input/
Output
Filtering &
Guardrails
6. Rating
Mechanisms
(thumbs up/down,
rating, text)
4. Chain
Prompts &
Applications
© 2024, Amazon Web Services, Inc. or its affiliates. All rights reserved.
The Journey of Fine-tuners
© 2024, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Generative AI Processes - Fine-Tuners
27
Data Labeling
Human in the loop to label
thousands, or hundreds of
data
Fine-tune
Customization for
specific domains
Deployment &
Prompt Engineering
Trade-off between cost,
precision, & latency
Chain of models
Prompt designing and
engineering
Filtering input prompts and
results using embeddings
Monitor
Human in the loop
feedback/rating, result
similarities, toxicity rate,
new methods under
research
© 2024, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Fine-Tuning, PEFT & Training
Thousands of example
input/outputs
Pre-trained FM
Tens of example
input/outputs
Parameter-Efficient Fine-
Tuning (PEFT - Training Job)
Requires low computation power (reduced cost e.g. 1/10 of
fine-tuning) as it adds small new layers in the Large FM
Lower accuracy
Fine-Tuning
(Training Job)
Requires high computation power (high cost but lower
than training) to calculate all the weights of Large FM
Higher accuracy
Fine-Tuned FM
Prompt
(Inputs)
Completion
(Outputs)
Select
the
‘right’
model
© 2024, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Generative AI SageMaker Capabilities
29
Amazon SageMaker
Gen AI for Builders
Data Scientists and ML Engineers
Pre-train, customize, and deploy
FMs with fine-grain controls for
generative AI use cases that have
stringent requirements on
accuracy, latency, and cost.
Deploy
Deploy the model
with SageMaker
Inference and run
inference
for your generative
AI use case
Browse
Browse 730+ public
and 20+ proprietary
FMs on SageMaker
Jumpstart
Evaluate
Evaluate multiple FMs
with SageMaker Clarify
Model Evaluation to
select the right one for
your use case
Customize
Customize with
SageMaker Training
using your own
dataset.
Use SageMaker HIL
for data collection
EXPERIMENT
Iteratively customize and evaluate models with
SageMaker Experiments
Industrialize with MLOps and Monitoring
Automate model selection, evaluation and
deployment with SageMaker Pipelines
© 2024, Amazon Web Services, Inc. or its affiliates. All rights reserved.
LLMOps Lifecycle
E V A L U A T I O N & F I N E - T U N I N G M L P I P E L I N E
Pre-
processing
Model
Comparison &
Selection
based on Cost,
Latency, etc.
Amazon SageMaker Pipeline
Evaluation LLM Pipeline
LLM 1
Evaluation
Evaluation
Fine-tuning
(Training Job)
Custom LLM 3
LLM 2
Evaluation
Model
Endpoint
Version
Approval &
Deployment
Monitoring
LLM fine-tuning, evaluation,
and selection at scale
LLM versioning,
lineage, approval
LLM serving,
monitoring, rating
Amazon EventBridge
Monitoring Alerts
Alert evaluation &
approval for re-fine-
tuning and evaluation
Input/Labelled
data
SageMaker Clarify
Model Evaluation
Model
Registration
V1 V2
Model Version
Model Registry
LLM artifacts
User
Interaction
via API
Generative AI
Application
Prompt prep.
Model triggering
Guardrails
Generative AI
User
Prompt Catalog
HIL Data
Labelers
& Prompt
Testers
LLM Output
Rating
Store user prompts to prompt
catalog for future automatic
evaluation (Improve testing
coverage)
Prompt
Translator
Generative AI app &
user interaction
Operationalize LLM Evaluation at Scale using
Amazon SageMaker Clarify and MLOps services
© 2024, Amazon Web Services, Inc. or its affiliates. All rights reserved.
MLOPs &
Generative AI
Technology –
Fine-tuner
T H R E E M A I N L A Y E R S A R E
I N T E R C O N N E C T E D
Generative
AI
Application
MLOps
Data
Data
Generative AI Application
MLOps
© 2024, Amazon Web Services, Inc. or its affiliates. All rights reserved.
MLOPs &
Generative AI
Technology –
Fine-tuner
T H R E E M A I N L A Y E R S A R E
I N T E R C O N N E C T E D
1
2
4
3
5
Amazon
Bedrock
© 2024, Amazon Web Services, Inc. or its affiliates. All rights reserved.
FMOps Foundation People & Processes
S E P A R A T I O N O F C O N C E R N S I S K E Y F O R S U C C E S S
Experimentation
Data
ML Governance
Platform
Administration
Model
Approvers
Auditors &
Compliance
Product Owner
Lead Data Scientists
Data Scientists
Data Owners
MLOps
Engineers
System
Administrators
Security
Data Engineers Business Stakeholders
ML Consumers
Model Build Model Test
Model
Deployment
Data Scientists ML Engineers
ML
Production
Environment
Provision infrastructure
Provide user access
Provide data access
Ingest data
Prepare, combine and catalogue Data
Visualize data
Prove that ML can solve a business
problem, i.e. PoC
Automate model build/training
providing scaled data
Automate model testing and
guardrails
Serving and monitoring the model
testing
Centralized model and code artifact
storage/versioning/auditing
© 2024, Amazon Web Services, Inc. or its affiliates. All rights reserved.
FMOps Foundation People & Processes
S E P A R A T I O N O F C O N C E R N S I S K E Y F O R S U C C E S S
Experimentation
Data
ML Governance
Platform
Administration
Model
Approvers
Auditors &
Compliance
Product Owner
Lead Data Scientists
Data Scientists
Data Owners
MLOps
Engineers
System
Administrators
Security
Data Engineers Business Stakeholders
ML Consumers
Model Build Model Test
Model
Deployment
Data Scientists ML Engineers
ML
Production
Environment
Provision infrastructure
Provide user access
Provide data access
Ingest data
Prepare, combine and catalogue Data
Visualize data
Prove that ML can solve a business
problem, i.e. PoC
Automate model build/training
providing scaled data
Automate model testing and
guardrails
Serving and monitoring the model
testing
Centralized model and code artifact
storage/versioning/auditing
Application Generative AI
Developers
AppDev
Prompt
Engineers /Testers
Generative AI
End-users
Fine-Tuners
Fine-Tuners
Web UI
Data
Labelers / Editors
Web UI
© 2024, Amazon Web Services, Inc. or its affiliates. All rights reserved.
The Journey of Providers
© 2024, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Generative AI Processes - Providers
P R E - T R A I N I N G A N F M I S N O T A T R I V I A L T A S K
36
Massive Data
Labeling
Human in the loop to label
trillions of data
Weeks to Months
of Training
Distribute the train of the
Foundation Models
Deployment
Deploy proprietary FMs or
Open Source the model artifact
and code
Leaderboards
FM are being evaluated by
third party organizations
© 2024, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Providers Productionize FM using MLOps
T R A I N M U L T I P L E F O U N D A T I O N S M O D E L S
© 2024, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Building FM using Amazon SageMaker HyperPod
D E V E L O P A N D T R A I N F M S C O N T I N U O U S L Y F O R W E E K S A N D M O N T H S
Self-healing clusters reduce
training time by up to 20%
Resilient
environment
SageMaker distributed
training libraries improve
performance by up to 20%
Streamline
distributed training
Control over computing
environment and workload
scheduling
Optimized
resources utilization
38
© 2024, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Conclusion & Resources
© 2024, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Consumers
Providers
Fine-Tuners
Generative AI Stack on AWS
A L I G N E D T O Y O U R M A T U R I T Y
End-users
© 2024, Amazon Web Services, Inc. or its affiliates. All rights reserved.
GPUs Inferentia
Trainium SageMaker
EC2 Capacity Blocks Neuron
UltraClusters EFA Nitro
Amazon Bedrock
Guardrails Agents Customization Capabilities
Amazon Q Amazon Q in
Amazon QuickSight
Amazon Q in
Amazon Connect
Amazon
CodeWhisperer
Providers
Fine-Tuners
Generative AI Stack on AWS
A L I G N E D T O Y O U R M A T U R I T Y
HyperPod
Consumers
End-users
© 2024, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Generative AI & Operations Resources
Generative AI for Builders on Amazon SageMaker Workshop
http://tinyurl.com/aws-genai-for-builders
FMOps/LLMOps: Operationalize generative AI and differences with MLOps
https://aws.amazon.com/blogs/machine-learning/fmops-llmops-operationalize-generative-ai-and-differences-with-mlops
Operationalize LLM Evaluation at Scale using Amazon SageMaker Clarify and MLOps services
https://aws.amazon.com/blogs/machine-learning/operationalize-llm-evaluation-at-scale-using-amazon-sagemaker-clarify-and-mlops-services
Build an internal SaaS service with cost and usage tracking for foundation models on Amazon Bedrock
https://aws.amazon.com/blogs/machine-learning/build-an-internal-saas-service-with-cost-and-usage-tracking-for-foundation-models-on-amazon-bedrock/
© 2024, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Thank you!
Thank you
Oleksii Ivanchenko
linkedin.com/in/oleksii-ivanchenko/

Oleksii Ivanchenko: FMOps/LLMOps: особливості підходу і чим відрізняється від MLOps (UA)

  • 1.
    © 2024, AmazonWeb Services, Inc. or its affiliates. All rights reserved. FMOps/LLMOps: Operationalise Generative AI Oleksii Ivanchenko Solutions Architect from AWS
  • 2.
    © 2024, AmazonWeb Services, Inc. or its affiliates. All rights reserved. MLOps Recap FMOps & LLMOps intro FMOps per Personas Agenda
  • 3.
    © 2024, AmazonWeb Services, Inc. or its affiliates. All rights reserved. What is MLOps? MLOps Definition People Processes Technology The combination of people, processes, and technology to productionize ML solutions efficiently. MLOps Machine Learning & Operations
  • 4.
    © 2024, AmazonWeb Services, Inc. or its affiliates. All rights reserved. MLOps Foundation Expected Outcomes S T A N D A R D I Z E O P E R A T I O N S A N D I N F R A S T R U C T U R E F O R Y O U R D A T A S C I E N C E Business Goal Technical Metric Before MLOps MLOps Expected Outcomes Business Value 1 Be more efficient in delivery Time to value (from idea to production) up to 12 months < 3 months Improve Speed-to-Value by 4x 2 Simplify route-to-live Time to productionize existing ML use cases 3-6 months < 2 weeks Reduce FTE overhead in average 8x 3 Standardize infrastructure, data, & code % Template driven development n/a > 85% Focus on innovation increasing re-usability by 85% 4 Standardize onboarding of new teams and ML use cases Time to instantiate a new MLOps infrastructure & ML projects 40 days < 1 hours Accelerate ML adoption across all business areas 5 Ensure high security standards Execute the ML solutions without internet access in a private cloud n/a No internet Your data is safe in your private cloud Reduce platform, people and operation costs Customer references building MLOps foundation and business benefits: • NatWest: https://aws.amazon.com/solutions/case-studies/natwest-group-case-study • BP: https://aws.amazon.com/solutions/case-studies/bp-machine-learning-case-study
  • 5.
    © 2024, AmazonWeb Services, Inc. or its affiliates. All rights reserved. MLOps Foundation People & Processes S E P A R A T I O N O F C O N C E R N S I S K E Y F O R S U C C E S S Experimentation Data ML Governance Platform Administration Model Approvers Auditors & Compliance Product Owner Lead Data Scientists Data Scientists Data Owners MLOps Engineers System Administrators Security Data Engineers Business Stakeholders ML Consumers Model Build Model Test Model Deployment Data Scientists ML Engineers ML Production Environment Provision infrastructure Provide user access Provide data access Ingest data Prepare, combine and catalogue Data Visualize data Prove that ML can solve a business problem, i.e. PoC Automate model build/training providing scaled data Automate model testing and guardrails Serving and monitoring the model testing Centralized model and code artifact storage/versioning/auditing
  • 6.
    © 2024, AmazonWeb Services, Inc. or its affiliates. All rights reserved. MLOPs Scalable Phase M U L T I P L E T E A M S A N D M L U S E C A S E S A D O P T M L O P S
  • 7.
    © 2024, AmazonWeb Services, Inc. or its affiliates. All rights reserved. Generative AI & MLOps MLOps & FMOps/LLMOps Differentiators
  • 8.
    © 2024, AmazonWeb Services, Inc. or its affiliates. All rights reserved. People Processes Technology Key Definitions MLOps FMOps Foundation Model Operations Productionize Generative AI Solutions (Text-Text/ Image/ Video/ Audio/ …) Machine Learning Operations Productionize ML solutions efficiently LLMOps Large Language Model Operations Productionize Large Language Model-based solutions
  • 9.
    © 2024, AmazonWeb Services, Inc. or its affiliates. All rights reserved. Differences between MLOps and FMOps People Processes Technology MLOps FMOps Processes & People Providers, fine-tuners, & consumers Select & Customize the FM on a Specific Context - Fine-tuning, parameter-efficient fine-tuning, prompt engineering - Proprietary, open source based on the application Evaluate & Monitor Fine-tuned Models Human feedback, prompt management, toxicity/bias… Data & Model Deployment Data privacy, multi-tenancy, & cost, latency, and precision Technology MLOps, data, & application layers
  • 10.
    © 2024, AmazonWeb Services, Inc. or its affiliates. All rights reserved. MLOps & FMOps Differentiators People Processes Technology MLOps FMOps Processes & People Providers, fine-tuners, & consumers Select & Customize the FM on a Specific Context - Fine-tuning, parameter-efficient fine-tuning, prompt engineering - Proprietary, open source based on the application Evaluate & Monitor Fine-tuned Models Human feedback, prompt management, toxicity/bias… Data & Model Deployment Data privacy, multi-tenancy, & cost, latency, and precision Technology MLOps, data, & application layers
  • 11.
    © 2024, AmazonWeb Services, Inc. or its affiliates. All rights reserved. Generative AI User Types & Skills 11 Generative AI User Types Skills No ML expertise required. Focus on prompt engineering and retrieval augmented generation. Strong end-to-end ML expertise and domain knowledge for tuning including prompt engineering. Deep end-to-end ML, NLP expertise and data science, labeler “squad“. Providers Entities who pre-train FMs from scratch themselves and provide them as a product to fine-tuner and consumer. Fine-Tuners Customize (e.g. fine-tune) providers‘ pre-trained FMs with their own data and run inference, while provide access to consumers. Consumers Interact with Generative AI services from provider or fine- tuner by text prompting or visual interface to complete desired actions. Productionize large models leveraging ML & Operations (FMOps) Productionize applications leveraging Generative AI & Operations Can become Can become
  • 12.
    © 2024, AmazonWeb Services, Inc. or its affiliates. All rights reserved. The Journey of Consumers
  • 13.
    © 2024, AmazonWeb Services, Inc. or its affiliates. All rights reserved. Generative AI Processes – Consumers 13 Select, evaluate, & use FM as a black-box & adapt context Using multiple chained models and prompt engineering techniques to achieve context adaptation (if necessary). Expose the solution to the end users Inputs/Outputs & Rating Interaction with the Generative AI Solutions. Aim to improve outcomes by penalizing or rewarding Generative AI solution outputs providing insights for prompt engineering
  • 14.
    © 2024, AmazonWeb Services, Inc. or its affiliates. All rights reserved. Select FM - Consumers 14 20 Step 1. Understand top proprietary and open source FM capabilities Step 2. Test & evaluate the top selected FMs (e.g. top 3) Step 3. Select the best FM based on your priorities >20 K available FMs 1 Quick short listing: Use a small set of test prompts based on the task Use case-based benchmarking: Evaluate the models based on predefined prompts and outputs (prompt catalog) Priority-based decision: Select based on business priorities cost, latency, precision 3
  • 15.
    © 2024, AmazonWeb Services, Inc. or its affiliates. All rights reserved. Step 1. Understand top FM capabilities 15 Proprietary or open-source FM Commercial License # of Parameters Quality metrics Speed Context Window Trained on (instructions, code, internet data) Fine-tunable Existing Customer Skills Main FM Capability Matrix
  • 16.
    © 2024, AmazonWeb Services, Inc. or its affiliates. All rights reserved. Step 2. Evaluate the top FMs 19 Does labelled data exist? Does the use case provide discrete outcomes/outputs? Accuracy metrics High evaluation precision but not many use cases Task-specific metrics Medium evaluation precision that might require human evaluation but many use cases Human in the Loop (HIL) High evaluation precision but costly and time consuming LLM Unknown evaluation precision (depend on the LLM) but automated Should we automate the test process? Yes Classic ML metrics e.g. precision, … Similarity: ROUGE, cosine similarity [0,1], … Prompt Stereotyping: Cross-entropy loss Toxicity: Detoxify, Toxigen Factual Knowledge: HELM, LAMA Benchmark Semantic Robustness: Word error rate No Manually human feedback based on predefined assessment rules e.g. using GroundTruth Feed the outputs to a ”reliable” LLM and instruct it to rate the outcome with an score and explanation Yes No Yes No
  • 17.
    © 2024, AmazonWeb Services, Inc. or its affiliates. All rights reserved. Step 3. Select the best FM based on priorities 21 Precision Speed Cost High speed, smaller model, lower precision, smaller cost Lower speed, larger model, higher precision, larger cost P0: lower cost No priority P1: Precision Model Evaluation Score HIL/LLM Feedback FM1 5/5 <Feedback summary> FM2 4/5 <Feedback summary> FM3 3/5 <Feedback summary> Model Cost FM1 $$$$ FM2 $ FM3 $$$ Model Speed FM1 ⚡⚡ FM2 ⚡ FM3 ⚡ Model Selection: FM2 EXAMPLE
  • 18.
    © 2024, AmazonWeb Services, Inc. or its affiliates. All rights reserved. 1. Select FM 2. Prompt Engineering 3. Test & Test Prompt lineage (input & outputs) Develop & Deploy Web Application New Test Set Generative AI Processes for LLM – Consumers 22 5. Input/ Output Filtering & Guardrails 6. Rating Mechanisms (thumbs up/down, rating, text) Backend Front-end Test Functionality Generative AI End- users Use web application rate the quality of output Generative AI Developers & Prompt Engineers/Testers LLM-based Generative AI Solution Input/ Output & Rating Interaction DevOps/AppDevs WebUI 4. Chain Prompts & Applications
  • 19.
    © 2024, AmazonWeb Services, Inc. or its affiliates. All rights reserved. Develop & Deploy Web Application New Test Set Generative AI Processes for LLM – Consumers 23 Backend Front-end Test Functionality Generative AI End- users Use web application rate the quality of output Generative AI Developers & Prompt Engineers/Testers LLM-based Generative AI Solution Input/ Output & Rating Interaction DevOps/AppDevs WebUI Create User Profile & Share Data Fine-tune Personalized Models 1. Select FM 2. Prompt Engineering Test & Test Prompt lineage (input & outputs) 4. Input/ Output Filtering 5. Rating Mechanisms (thumbs up/down, rating, text) 3. Chain Prompts & Applications Fine-tune FM using APIs 1.Select FM or Use Fine- tuned Mode 2. Prompt Engineering 3. Test & Test Prompt lineage (input & outputs) 5. Input/ Output Filtering & Guardrails 6. Rating Mechanisms (thumbs up/down, rating, text) 4. Chain Prompts & Applications
  • 20.
    © 2024, AmazonWeb Services, Inc. or its affiliates. All rights reserved. The Journey of Fine-tuners
  • 21.
    © 2024, AmazonWeb Services, Inc. or its affiliates. All rights reserved. Generative AI Processes - Fine-Tuners 27 Data Labeling Human in the loop to label thousands, or hundreds of data Fine-tune Customization for specific domains Deployment & Prompt Engineering Trade-off between cost, precision, & latency Chain of models Prompt designing and engineering Filtering input prompts and results using embeddings Monitor Human in the loop feedback/rating, result similarities, toxicity rate, new methods under research
  • 22.
    © 2024, AmazonWeb Services, Inc. or its affiliates. All rights reserved. Fine-Tuning, PEFT & Training Thousands of example input/outputs Pre-trained FM Tens of example input/outputs Parameter-Efficient Fine- Tuning (PEFT - Training Job) Requires low computation power (reduced cost e.g. 1/10 of fine-tuning) as it adds small new layers in the Large FM Lower accuracy Fine-Tuning (Training Job) Requires high computation power (high cost but lower than training) to calculate all the weights of Large FM Higher accuracy Fine-Tuned FM Prompt (Inputs) Completion (Outputs) Select the ‘right’ model
  • 23.
    © 2024, AmazonWeb Services, Inc. or its affiliates. All rights reserved. Generative AI SageMaker Capabilities 29 Amazon SageMaker Gen AI for Builders Data Scientists and ML Engineers Pre-train, customize, and deploy FMs with fine-grain controls for generative AI use cases that have stringent requirements on accuracy, latency, and cost. Deploy Deploy the model with SageMaker Inference and run inference for your generative AI use case Browse Browse 730+ public and 20+ proprietary FMs on SageMaker Jumpstart Evaluate Evaluate multiple FMs with SageMaker Clarify Model Evaluation to select the right one for your use case Customize Customize with SageMaker Training using your own dataset. Use SageMaker HIL for data collection EXPERIMENT Iteratively customize and evaluate models with SageMaker Experiments Industrialize with MLOps and Monitoring Automate model selection, evaluation and deployment with SageMaker Pipelines
  • 24.
    © 2024, AmazonWeb Services, Inc. or its affiliates. All rights reserved. LLMOps Lifecycle E V A L U A T I O N & F I N E - T U N I N G M L P I P E L I N E Pre- processing Model Comparison & Selection based on Cost, Latency, etc. Amazon SageMaker Pipeline Evaluation LLM Pipeline LLM 1 Evaluation Evaluation Fine-tuning (Training Job) Custom LLM 3 LLM 2 Evaluation Model Endpoint Version Approval & Deployment Monitoring LLM fine-tuning, evaluation, and selection at scale LLM versioning, lineage, approval LLM serving, monitoring, rating Amazon EventBridge Monitoring Alerts Alert evaluation & approval for re-fine- tuning and evaluation Input/Labelled data SageMaker Clarify Model Evaluation Model Registration V1 V2 Model Version Model Registry LLM artifacts User Interaction via API Generative AI Application Prompt prep. Model triggering Guardrails Generative AI User Prompt Catalog HIL Data Labelers & Prompt Testers LLM Output Rating Store user prompts to prompt catalog for future automatic evaluation (Improve testing coverage) Prompt Translator Generative AI app & user interaction Operationalize LLM Evaluation at Scale using Amazon SageMaker Clarify and MLOps services
  • 25.
    © 2024, AmazonWeb Services, Inc. or its affiliates. All rights reserved. MLOPs & Generative AI Technology – Fine-tuner T H R E E M A I N L A Y E R S A R E I N T E R C O N N E C T E D Generative AI Application MLOps Data Data Generative AI Application MLOps
  • 26.
    © 2024, AmazonWeb Services, Inc. or its affiliates. All rights reserved. MLOPs & Generative AI Technology – Fine-tuner T H R E E M A I N L A Y E R S A R E I N T E R C O N N E C T E D 1 2 4 3 5 Amazon Bedrock
  • 27.
    © 2024, AmazonWeb Services, Inc. or its affiliates. All rights reserved. FMOps Foundation People & Processes S E P A R A T I O N O F C O N C E R N S I S K E Y F O R S U C C E S S Experimentation Data ML Governance Platform Administration Model Approvers Auditors & Compliance Product Owner Lead Data Scientists Data Scientists Data Owners MLOps Engineers System Administrators Security Data Engineers Business Stakeholders ML Consumers Model Build Model Test Model Deployment Data Scientists ML Engineers ML Production Environment Provision infrastructure Provide user access Provide data access Ingest data Prepare, combine and catalogue Data Visualize data Prove that ML can solve a business problem, i.e. PoC Automate model build/training providing scaled data Automate model testing and guardrails Serving and monitoring the model testing Centralized model and code artifact storage/versioning/auditing
  • 28.
    © 2024, AmazonWeb Services, Inc. or its affiliates. All rights reserved. FMOps Foundation People & Processes S E P A R A T I O N O F C O N C E R N S I S K E Y F O R S U C C E S S Experimentation Data ML Governance Platform Administration Model Approvers Auditors & Compliance Product Owner Lead Data Scientists Data Scientists Data Owners MLOps Engineers System Administrators Security Data Engineers Business Stakeholders ML Consumers Model Build Model Test Model Deployment Data Scientists ML Engineers ML Production Environment Provision infrastructure Provide user access Provide data access Ingest data Prepare, combine and catalogue Data Visualize data Prove that ML can solve a business problem, i.e. PoC Automate model build/training providing scaled data Automate model testing and guardrails Serving and monitoring the model testing Centralized model and code artifact storage/versioning/auditing Application Generative AI Developers AppDev Prompt Engineers /Testers Generative AI End-users Fine-Tuners Fine-Tuners Web UI Data Labelers / Editors Web UI
  • 29.
    © 2024, AmazonWeb Services, Inc. or its affiliates. All rights reserved. The Journey of Providers
  • 30.
    © 2024, AmazonWeb Services, Inc. or its affiliates. All rights reserved. Generative AI Processes - Providers P R E - T R A I N I N G A N F M I S N O T A T R I V I A L T A S K 36 Massive Data Labeling Human in the loop to label trillions of data Weeks to Months of Training Distribute the train of the Foundation Models Deployment Deploy proprietary FMs or Open Source the model artifact and code Leaderboards FM are being evaluated by third party organizations
  • 31.
    © 2024, AmazonWeb Services, Inc. or its affiliates. All rights reserved. Providers Productionize FM using MLOps T R A I N M U L T I P L E F O U N D A T I O N S M O D E L S
  • 32.
    © 2024, AmazonWeb Services, Inc. or its affiliates. All rights reserved. Building FM using Amazon SageMaker HyperPod D E V E L O P A N D T R A I N F M S C O N T I N U O U S L Y F O R W E E K S A N D M O N T H S Self-healing clusters reduce training time by up to 20% Resilient environment SageMaker distributed training libraries improve performance by up to 20% Streamline distributed training Control over computing environment and workload scheduling Optimized resources utilization 38
  • 33.
    © 2024, AmazonWeb Services, Inc. or its affiliates. All rights reserved. Conclusion & Resources
  • 34.
    © 2024, AmazonWeb Services, Inc. or its affiliates. All rights reserved. Consumers Providers Fine-Tuners Generative AI Stack on AWS A L I G N E D T O Y O U R M A T U R I T Y End-users
  • 35.
    © 2024, AmazonWeb Services, Inc. or its affiliates. All rights reserved. GPUs Inferentia Trainium SageMaker EC2 Capacity Blocks Neuron UltraClusters EFA Nitro Amazon Bedrock Guardrails Agents Customization Capabilities Amazon Q Amazon Q in Amazon QuickSight Amazon Q in Amazon Connect Amazon CodeWhisperer Providers Fine-Tuners Generative AI Stack on AWS A L I G N E D T O Y O U R M A T U R I T Y HyperPod Consumers End-users
  • 36.
    © 2024, AmazonWeb Services, Inc. or its affiliates. All rights reserved. Generative AI & Operations Resources Generative AI for Builders on Amazon SageMaker Workshop http://tinyurl.com/aws-genai-for-builders FMOps/LLMOps: Operationalize generative AI and differences with MLOps https://aws.amazon.com/blogs/machine-learning/fmops-llmops-operationalize-generative-ai-and-differences-with-mlops Operationalize LLM Evaluation at Scale using Amazon SageMaker Clarify and MLOps services https://aws.amazon.com/blogs/machine-learning/operationalize-llm-evaluation-at-scale-using-amazon-sagemaker-clarify-and-mlops-services Build an internal SaaS service with cost and usage tracking for foundation models on Amazon Bedrock https://aws.amazon.com/blogs/machine-learning/build-an-internal-saas-service-with-cost-and-usage-tracking-for-foundation-models-on-amazon-bedrock/
  • 37.
    © 2024, AmazonWeb Services, Inc. or its affiliates. All rights reserved. Thank you! Thank you Oleksii Ivanchenko linkedin.com/in/oleksii-ivanchenko/