SlideShare a Scribd company logo
1 of 31
Fine-tuning Large Language Model
for data-to-text generation
• Lead ML Engineer @ Ecentria Group / Intexsys
• I'm working on a large US e-commerce project
• M.Sc. in Computer Science @ TSI
• +10 years experience in Software Engineering
2
Agenda
• LLM fine-tuning introduction
• Data-to-text generation for specific business case
• Conclusions
• Q&A
3
Introduction to LLM fine-tuning
4
Zero/One/Few-shot learning
• Zero-shot learning
• Mean providing a prompt that isn’t a part of the training data
• Example: asking open questions to model
• One/Few-shot learning
• Provide one or few examples as a part of the prompt
• Example: asking the model to format the text and providing a few examples
• Prompt engineering
5
Q:
What is the
title of this
section?
A:
Introduction
to LLM fine-
tuning
What is fine-tuning?
In deep learning, fine-tuning is an approach to transfer learning in
which the weights of a pre-trained model are trained on new data.
https://en.wikipedia.org/wiki/Fine-tuning_(deep_learning)
6
This Photo by Unknown Author is licensed under CC BY-SA
When do you need to fine-tune the model?
• Prompt engineering did not work out.
• Retrieval augmented generation (RAG) didn’t work out.
• Highly qualitative data for training is available.
• Cost is not a problem.
• It is clear how to measure that result.
• Read more:
• https://learn.microsoft.com/en-us/azure/ai-services/openai/concepts/fine-
tuning-considerations
• https://platform.openai.com/docs/guides/fine-tuning/when-to-use-fine-
tuning
7
Difference between pretraining and fine-tuning 1/2
Pre-training Fine-tuning
Training time Weeks Hours
Compute Thousands of
GPUs
One or few
GPUs
Dataset Terabytes
(e.g., C4, Pile)
100-1000 MB
Budget $ Millions $ Hundreds
8
LLM
Pretraining
Pre-trained
LLM
Fine-tuning
Huge
datasets
and a lot
of compute
Small
dataset and
one or few
GPUs
Difference between pretraining and fine-tuning 2/2
Source: https://blog.research.google/2020/02/exploring-transfer-learning-with-t5.html
Terabytes of
input data
Thousands of
examples
Unsupervised learning
Supervised learning
9
Fine-tuning methods
• Full fine-tuning continues the initial training of the model using the
existing checkpoint.
• PEFT (Parameter-Efficient Fine-Tuning) methods only fine-tune a small
number of (extra) model parameters - significantly decreasing
computational and storage costs - while yielding performance comparable
to a fully fine-tuned model.
• LoRA is a way to train large models efficiently by inserting (typically in the attention
blocks) smaller trainable matrices to be learnt during finetuning.
• Prompt-based methods (p-tuning, prefix tuning, prompt tuning). Instead of manually
creating hard (text) prompts, soft prompting methods are applied by adding
learnable parameters to the input embeddings that can be optimized for a specific
task while keeping the pre-trained model’s parameters frozen.
10
Read more: https://huggingface.co/docs/peft/index
Pre-trained open-source LLMs
• Consider licenses that allow commercial use cases.
• A larger LLM has greater capabilities, but it also requires higher
computing resources.
• A larger context window allows adding more information into context.
• Most of the attractive models:
• Mistral with 7B params, 4096 tokens and 16K sliding window, Apache License
2.0
• Gemma with 7B params, 8192 tokens, Google’s Gemma Terms of use
11
Read mode: https://github.com/eugeneyan/open-llms
Libraries for fine-tuning
Library name Company Popularit
y ⭐
PEF
T
DL
Framework
Supported LLM models Links
Deep Speed Microsoft 31.5k ✅ PyTorch A lot docs, github
PEFT HuggingFace🤗 12.7k ✅ PyTorch LLaMA, Mistral, T5, GPT,
others
blog, github,
docs
Accelerate HuggingFace🤗 6.6k ✖️ PyTorch A lot github, docs
NeMo Nvidia 9.4k ✅ PyTorch LLaMA, Falcon, T5, GPT,
others
docs, github
T5X Google 2.3k ❔ JAX T5 and some others, PaLM* paper, github,
docs
Paxml Google 0.3k ❔ JAX PaLM-2* docs, github
12
Supervised fine-tuning in clouds
13
Cloud LLM Model
Azure GPT, Llama
AWS Bedrock Amazon Titan, Anthropic Cloude, Cohere Command, Meta Llama [link]
GCP Vertex AI* PaLM 🌴, Gemma, T5, Gemini**, Llama
OpenAI Platform GPT
Anthropic Claude
Cohere Command
MosaicML MPT
* - supports RLHF
** - coming soon
Hardware for industrial needs
14
Read more: https://khairy2011.medium.com/tpu-vs-gpu-vs-cerebras-vs-
graphcore-a-fair-comparison-between-ml-hardware-3f5a19d89e38
• Nvidia GPU:
• H100 up to 80GB RAM
• Supports any framework
• Available in any cloud
• Requires NVLink/NVSwitch for efficient
data/model parallelism
• On-prem possibility
• Google TPU
• More const efficient
• V3-8 up to 128GB RAM
• Support XLA only: Jax, PyTorch/XLA, TF
• GCP lock
• Supports data/model parallelism out-of-the-
box
Other important topics
• Inference performance optimization by reducing memory footprint and
improving parallelizability.
• Efficient attention using lower-level hardware-aware optimizations (e.g., Flash
Attention)
• Quantization by reducing the computational precision of weights and activations
• Mixture of Experts to decrease inference time by not using all experts at once.
• and others.
• Hallucinations solutions
• Retrieval-augmented generation (RAG)
• Misleading behavior solutions
• Reinforcement Learning From Human Feedback (RLHF)
Kaddour, J., Harris, J., Mozes, M., Bradley, H., Raileanu, R. and McHardy, R., 2023. Challenges and applications
of large language models. arXiv preprint arXiv:2307.10169. 15
Data-to-Text generation
16
Data-to-Text generation
17
{
"city": "San Francisco",
"date": "2024-02-28",
"temperature": {
"high": 68,
"low": 51
},
"conditions": "Sunny",
"wind_speed": 5,
"humidity": 72
}
LLM
Today's weather report for
San Francisco on February
28, 2024, indicates a sunny
day with a high of 68°F and
a low of 51°F. Expect mild
wind speeds around 5 mph
and a humidity level of
72%. It's a beautiful day to
be outdoors!
Out setup
• Model: T5 3B, (encoder-decoder arch.)
• Library: T5X
• Hardware: single TPU v3.8 128GB (enough to fit T5 3B into single TPU
unit)
• Cloud: GCP (we can migrate model training to GPUs)
• Dataset: 300 000 train and 50 000 for A/B testing
• Supervised Full Fine-tuning.
18
Why T5 in 2021/2022?
• GPT-2/J/Neo vs T5.
• T5X mature fine-tuning
library.
• Large models up to 11B.
• Cost-efficient training using
TPUs and JAX.
• Apache 2.0 license.
• Llama and most other
popular open models were
released later (today, 2024)
19
GPT-2
T5
Training task definition
20
City: San Francisco
Date: 2024-02-28
Temperature: 68, 51
Conditions: Sunny
Wind speed: 5
Humidity: 72
T5 3B
Today's weather report for
San Francisco on February
28, 2024, indicates a sunny
day with a high of 68°F and
a low of 51°F. Expect mild
wind speeds around 5 mph
and a humidity level of
72%. It's a beautiful day to
be outdoors!
City: San Francisco
Date: 2024-02-28
Temperature: 68, 51
Conditions: Sunny
Wind speed: 5
Humidity: 72
City: San Francisco
Date: 2024-02-28
Temperature: 68, 51
Conditions: Sunny
Wind speed: 5
Humidity: 72
Today's weather report for
San Francisco on February
28, 2024, indicates a sunny
day with a high of 68°F and
a low of 51°F. Expect mild
wind speeds around 5 mph
and a humidity level of
72%. It's a beautiful day to
be outdoors!
Today's weather report for
San Francisco on February
28, 2024, indicates a sunny
day with a high of 68°F and
a low of 51°F. Expect mild
wind speeds around 5 mph
and a humidity level of
72%. It's a beautiful day to
be outdoors!
300,000 training pairs of input and output
Problems with smaller models
• Problems (in both the smallest size GPT and the smallest size T5 60m
params):
• Not natural-sounding enough
• Too short
• Texts have “errors” (e.g., wrong number values)
• Model used “forbidden phrases”
• “Messy” text: e.g., same phrases repeating twice within the same text,
sometimes in a row; incomplete sentences/brief expressions without context
at the end of the text, etc.
21
Solution
• Use larger model T5 3B
• Additional manipulations with input data:
• Cleaned up the text by removing irrelevant information.
• Supplying “forbidden phrases” as inputs to avoid using them liberally by the
model.
• Providing explicit labels in input prompt:
• “Low quality” – helps the model to distinguish the text quality.
• “Too short” – helps the model to distinguish between short and long descriptions.
• Text clustering using Locality-sensitive hashing (LSH) to find similar texts.
22
Regularize the model via prompting
23
23
City: San Francisco
Date: 2024-02-28
Temperature: 68, 51 Conditions:
Sunny
Wind speed: 5
Humidity: 72
Less formal
LLM
Today's weather report for San
Francisco on February 28, 2024,
indicates a sunny day with a high
of 68°F and a low of 51°F. Expect
mild wind speeds around 5 mph
and a humidity level of 72%. It's
a beautiful day to be outdoors!
City: San Francisco
Date: 2024-02-28
Temperature: 68, 51 Conditions:
Sunny
Wind speed: 5
Humidity: 72
LLM
Today's weather report for San
Francisco on February 28, 2024,
indicates a sunny day with a high
of 68°F and a low of 51°F. Expect
mild wind speeds around 5 mph
and a humidity level of 72%.
Training
Generation
Model learns the relationship between the input prompt and the required generated text.
Model evaluation by size
24
70%
75%
80%
85%
90%
95%
Small XL
Text w/o “forbidden words” (higher is better)
0
5
10
15
20
25
- 0 0 - 10 10 - 20 20 - 30 30 - 40 40 - 50 50 - 60 60 - 70 70 - 80 80 - 90 90 - 100 100 - 110 110 - 120 120 - 130 130 - 140 140 - 150 150 - 160 160 - 170 170 - 180 180 - 190 190 - 200 200 - 210 210 -
Number
of
examples
Length distribution (words)
T5 small T5 XL
Results evaluation
• Custom metrics.
• Example. Numbers of occurrences in the generated text:
• forbidden words
• unseen numbers
• Expert verification.
• Experts reviewed generated texts.
• Set of known examples that were monitored.
• A/B testing validation.
25
Conclusions
• While getting into the LLM field has a very steep
learning curve, collecting datasets is the most
important and time-consuming part.
• Larger models provide better results (e.g., 1B vs
3B). However, larger models require more ML
engineering efforts.
• Fine-tuning requires very high-quality data.
26
Thank you!
27
Model size
28
We are
here
Anthropic’s RLHF dataset example
29
Source https://huggingface.co/datasets/Anthropic/hh-rlhf?row=0
Different usages of the models
• Prediction using zero-shot learning
• No need for additional training
• Very large models are required (>=100 billion parameters)
• Not-so-precise prediction
• Simple, prompt engineering is required
• It is almost impossible to influence the model’s generated text quality
• Few examples are required
• Prediction using fine-tunned model
• Requires model’s additional training (might take from hours to days)
• Not-so-large models are required (<11 billion parameters)
• More precise predictions
• Advanced knowledge is required to fine-tune the model
• Adjustable generated text quality by improving the dataset for fine-tuning
• >=10 000 examples are required
30
Using fine-tunned models
Choose cloud or
on-prem
pre-trained model
Fine-tuning
Use model for
prediction
Choose cloud or
on-prem
pre-trained model
Prompt
engineering
Use model for
prediction
Piece of
cake
Challenging
Better
generated text
quality
Not so precise
generated text
Using models with Zero-shot learning
31

More Related Content

Similar to Fine-tuning Large Language Models by Dmitry Balabka

(Some) pitfalls of distributed learning
(Some) pitfalls of distributed learning(Some) pitfalls of distributed learning
(Some) pitfalls of distributed learningYves Raimond
 
Solving the Database Problem
Solving the Database ProblemSolving the Database Problem
Solving the Database ProblemJay Gordon
 
Lessons Learned Replatforming A Large Machine Learning Application To Apache ...
Lessons Learned Replatforming A Large Machine Learning Application To Apache ...Lessons Learned Replatforming A Large Machine Learning Application To Apache ...
Lessons Learned Replatforming A Large Machine Learning Application To Apache ...Databricks
 
The Transformation of HPC: Simulation and Cognitive Methods in the Era of Big...
The Transformation of HPC: Simulation and Cognitive Methods in the Era of Big...The Transformation of HPC: Simulation and Cognitive Methods in the Era of Big...
The Transformation of HPC: Simulation and Cognitive Methods in the Era of Big...inside-BigData.com
 
Anusua Trivedi, Data Scientist at Texas Advanced Computing Center (TACC), UT ...
Anusua Trivedi, Data Scientist at Texas Advanced Computing Center (TACC), UT ...Anusua Trivedi, Data Scientist at Texas Advanced Computing Center (TACC), UT ...
Anusua Trivedi, Data Scientist at Texas Advanced Computing Center (TACC), UT ...MLconf
 
Advanced Schema Design Patterns
Advanced Schema Design PatternsAdvanced Schema Design Patterns
Advanced Schema Design PatternsMongoDB
 
Webinar: Performance vs. Cost - Solving The HPC Storage Tug-of-War
Webinar: Performance vs. Cost - Solving The HPC Storage Tug-of-WarWebinar: Performance vs. Cost - Solving The HPC Storage Tug-of-War
Webinar: Performance vs. Cost - Solving The HPC Storage Tug-of-WarStorage Switzerland
 
Distributed Decision Tree Learning for Mining Big Data Streams
Distributed Decision Tree Learning for Mining Big Data StreamsDistributed Decision Tree Learning for Mining Big Data Streams
Distributed Decision Tree Learning for Mining Big Data StreamsArinto Murdopo
 
[DataCon.TW 2017] Data Lake: centralize in on-prem vs. decentralize on cloud
[DataCon.TW 2017] Data Lake: centralize in on-prem vs. decentralize on cloud[DataCon.TW 2017] Data Lake: centralize in on-prem vs. decentralize on cloud
[DataCon.TW 2017] Data Lake: centralize in on-prem vs. decentralize on cloudJeff Hung
 
All in AI: LLM Landscape & RAG in 2024 with Mark Ryan (Google) & Jerry Liu (L...
All in AI: LLM Landscape & RAG in 2024 with Mark Ryan (Google) & Jerry Liu (L...All in AI: LLM Landscape & RAG in 2024 with Mark Ryan (Google) & Jerry Liu (L...
All in AI: LLM Landscape & RAG in 2024 with Mark Ryan (Google) & Jerry Liu (L...Daniel Zivkovic
 
FlorenceAI: Reinventing Data Science at Humana
FlorenceAI: Reinventing Data Science at HumanaFlorenceAI: Reinventing Data Science at Humana
FlorenceAI: Reinventing Data Science at HumanaDatabricks
 
“A Practical Guide to Implementing ML on Embedded Devices,” a Presentation fr...
“A Practical Guide to Implementing ML on Embedded Devices,” a Presentation fr...“A Practical Guide to Implementing ML on Embedded Devices,” a Presentation fr...
“A Practical Guide to Implementing ML on Embedded Devices,” a Presentation fr...Edge AI and Vision Alliance
 
BloombergGPT.pdfA Large Language Model for Finance
BloombergGPT.pdfA Large Language Model for FinanceBloombergGPT.pdfA Large Language Model for Finance
BloombergGPT.pdfA Large Language Model for Finance957671457
 
Big Data Lessons from the Cloud
Big Data Lessons from the CloudBig Data Lessons from the Cloud
Big Data Lessons from the CloudMapR Technologies
 
Large Language Models for Test Case Evolution and Repair
Large Language Models for Test Case Evolution and RepairLarge Language Models for Test Case Evolution and Repair
Large Language Models for Test Case Evolution and RepairLionel Briand
 
Hadoop is dead - long live Hadoop | BiDaTA 2013 Genoa
Hadoop is dead - long live Hadoop | BiDaTA 2013 GenoaHadoop is dead - long live Hadoop | BiDaTA 2013 Genoa
Hadoop is dead - long live Hadoop | BiDaTA 2013 Genoalarsgeorge
 
Cloudera Data Science Challenge 3 Solution by Doug Needham
Cloudera Data Science Challenge 3 Solution by Doug NeedhamCloudera Data Science Challenge 3 Solution by Doug Needham
Cloudera Data Science Challenge 3 Solution by Doug NeedhamDoug Needham
 
Webinar: 5 Steps To The Perfect Storage Refresh
Webinar: 5 Steps To The Perfect Storage RefreshWebinar: 5 Steps To The Perfect Storage Refresh
Webinar: 5 Steps To The Perfect Storage RefreshStorage Switzerland
 
Finding balance of DDD while your application grows
Finding balance of DDD while your application growsFinding balance of DDD while your application grows
Finding balance of DDD while your application growsCarolina Karklis
 

Similar to Fine-tuning Large Language Models by Dmitry Balabka (20)

(Some) pitfalls of distributed learning
(Some) pitfalls of distributed learning(Some) pitfalls of distributed learning
(Some) pitfalls of distributed learning
 
Solving the Database Problem
Solving the Database ProblemSolving the Database Problem
Solving the Database Problem
 
Lessons Learned Replatforming A Large Machine Learning Application To Apache ...
Lessons Learned Replatforming A Large Machine Learning Application To Apache ...Lessons Learned Replatforming A Large Machine Learning Application To Apache ...
Lessons Learned Replatforming A Large Machine Learning Application To Apache ...
 
The Transformation of HPC: Simulation and Cognitive Methods in the Era of Big...
The Transformation of HPC: Simulation and Cognitive Methods in the Era of Big...The Transformation of HPC: Simulation and Cognitive Methods in the Era of Big...
The Transformation of HPC: Simulation and Cognitive Methods in the Era of Big...
 
Anusua Trivedi, Data Scientist at Texas Advanced Computing Center (TACC), UT ...
Anusua Trivedi, Data Scientist at Texas Advanced Computing Center (TACC), UT ...Anusua Trivedi, Data Scientist at Texas Advanced Computing Center (TACC), UT ...
Anusua Trivedi, Data Scientist at Texas Advanced Computing Center (TACC), UT ...
 
Advanced Schema Design Patterns
Advanced Schema Design PatternsAdvanced Schema Design Patterns
Advanced Schema Design Patterns
 
Webinar: Performance vs. Cost - Solving The HPC Storage Tug-of-War
Webinar: Performance vs. Cost - Solving The HPC Storage Tug-of-WarWebinar: Performance vs. Cost - Solving The HPC Storage Tug-of-War
Webinar: Performance vs. Cost - Solving The HPC Storage Tug-of-War
 
Distributed Decision Tree Learning for Mining Big Data Streams
Distributed Decision Tree Learning for Mining Big Data StreamsDistributed Decision Tree Learning for Mining Big Data Streams
Distributed Decision Tree Learning for Mining Big Data Streams
 
[DataCon.TW 2017] Data Lake: centralize in on-prem vs. decentralize on cloud
[DataCon.TW 2017] Data Lake: centralize in on-prem vs. decentralize on cloud[DataCon.TW 2017] Data Lake: centralize in on-prem vs. decentralize on cloud
[DataCon.TW 2017] Data Lake: centralize in on-prem vs. decentralize on cloud
 
All in AI: LLM Landscape & RAG in 2024 with Mark Ryan (Google) & Jerry Liu (L...
All in AI: LLM Landscape & RAG in 2024 with Mark Ryan (Google) & Jerry Liu (L...All in AI: LLM Landscape & RAG in 2024 with Mark Ryan (Google) & Jerry Liu (L...
All in AI: LLM Landscape & RAG in 2024 with Mark Ryan (Google) & Jerry Liu (L...
 
FlorenceAI: Reinventing Data Science at Humana
FlorenceAI: Reinventing Data Science at HumanaFlorenceAI: Reinventing Data Science at Humana
FlorenceAI: Reinventing Data Science at Humana
 
“A Practical Guide to Implementing ML on Embedded Devices,” a Presentation fr...
“A Practical Guide to Implementing ML on Embedded Devices,” a Presentation fr...“A Practical Guide to Implementing ML on Embedded Devices,” a Presentation fr...
“A Practical Guide to Implementing ML on Embedded Devices,” a Presentation fr...
 
BloombergGPT.pdfA Large Language Model for Finance
BloombergGPT.pdfA Large Language Model for FinanceBloombergGPT.pdfA Large Language Model for Finance
BloombergGPT.pdfA Large Language Model for Finance
 
Big Data Lessons from the Cloud
Big Data Lessons from the CloudBig Data Lessons from the Cloud
Big Data Lessons from the Cloud
 
useR 2014 jskim
useR 2014 jskimuseR 2014 jskim
useR 2014 jskim
 
Large Language Models for Test Case Evolution and Repair
Large Language Models for Test Case Evolution and RepairLarge Language Models for Test Case Evolution and Repair
Large Language Models for Test Case Evolution and Repair
 
Hadoop is dead - long live Hadoop | BiDaTA 2013 Genoa
Hadoop is dead - long live Hadoop | BiDaTA 2013 GenoaHadoop is dead - long live Hadoop | BiDaTA 2013 Genoa
Hadoop is dead - long live Hadoop | BiDaTA 2013 Genoa
 
Cloudera Data Science Challenge 3 Solution by Doug Needham
Cloudera Data Science Challenge 3 Solution by Doug NeedhamCloudera Data Science Challenge 3 Solution by Doug Needham
Cloudera Data Science Challenge 3 Solution by Doug Needham
 
Webinar: 5 Steps To The Perfect Storage Refresh
Webinar: 5 Steps To The Perfect Storage RefreshWebinar: 5 Steps To The Perfect Storage Refresh
Webinar: 5 Steps To The Perfect Storage Refresh
 
Finding balance of DDD while your application grows
Finding balance of DDD while your application growsFinding balance of DDD while your application grows
Finding balance of DDD while your application grows
 

More from DevClub_lv

"Infrastructure and AWS at Scale: The story of Posti" by Goran Gjorgievski @ ...
"Infrastructure and AWS at Scale: The story of Posti" by Goran Gjorgievski @ ..."Infrastructure and AWS at Scale: The story of Posti" by Goran Gjorgievski @ ...
"Infrastructure and AWS at Scale: The story of Posti" by Goran Gjorgievski @ ...DevClub_lv
 
From 50 to 500 product engineers – data-driven approach to building impactful...
From 50 to 500 product engineers – data-driven approach to building impactful...From 50 to 500 product engineers – data-driven approach to building impactful...
From 50 to 500 product engineers – data-driven approach to building impactful...DevClub_lv
 
Why is it so complex to accept a payment? by Dmitry Buzdin from A-Heads Consu...
Why is it so complex to accept a payment? by Dmitry Buzdin from A-Heads Consu...Why is it so complex to accept a payment? by Dmitry Buzdin from A-Heads Consu...
Why is it so complex to accept a payment? by Dmitry Buzdin from A-Heads Consu...DevClub_lv
 
Do we need DDD? by Jurijs Čudnovskis from “Craftsmans Passion” at Fintech foc...
Do we need DDD? by Jurijs Čudnovskis from “Craftsmans Passion” at Fintech foc...Do we need DDD? by Jurijs Čudnovskis from “Craftsmans Passion” at Fintech foc...
Do we need DDD? by Jurijs Čudnovskis from “Craftsmans Passion” at Fintech foc...DevClub_lv
 
Network security with Azure PaaS services by Erwin Staal from 4DotNet at Azur...
Network security with Azure PaaS services by Erwin Staal from 4DotNet at Azur...Network security with Azure PaaS services by Erwin Staal from 4DotNet at Azur...
Network security with Azure PaaS services by Erwin Staal from 4DotNet at Azur...DevClub_lv
 
Using Azure Managed Identities for your App Services by Jan de Vries from 4Do...
Using Azure Managed Identities for your App Services by Jan de Vries from 4Do...Using Azure Managed Identities for your App Services by Jan de Vries from 4Do...
Using Azure Managed Identities for your App Services by Jan de Vries from 4Do...DevClub_lv
 
SRE (service reliability engineer) on big DevOps platform running on the clou...
SRE (service reliability engineer) on big DevOps platform running on the clou...SRE (service reliability engineer) on big DevOps platform running on the clou...
SRE (service reliability engineer) on big DevOps platform running on the clou...DevClub_lv
 
Emergence of IOT & Cloud – Azure by Narendra Sharma at Cloud focused 76th Dev...
Emergence of IOT & Cloud – Azure by Narendra Sharma at Cloud focused 76th Dev...Emergence of IOT & Cloud – Azure by Narendra Sharma at Cloud focused 76th Dev...
Emergence of IOT & Cloud – Azure by Narendra Sharma at Cloud focused 76th Dev...DevClub_lv
 
Cross Platform Mobile Development using Flutter by Wei Meng Lee at Mobile foc...
Cross Platform Mobile Development using Flutter by Wei Meng Lee at Mobile foc...Cross Platform Mobile Development using Flutter by Wei Meng Lee at Mobile foc...
Cross Platform Mobile Development using Flutter by Wei Meng Lee at Mobile foc...DevClub_lv
 
Building resilient frontend architecture by Monica Lent at FrontCon 2019
Building resilient frontend architecture by Monica Lent at FrontCon 2019Building resilient frontend architecture by Monica Lent at FrontCon 2019
Building resilient frontend architecture by Monica Lent at FrontCon 2019DevClub_lv
 
Things that every JavaScript developer should know by Rachel Appel at FrontCo...
Things that every JavaScript developer should know by Rachel Appel at FrontCo...Things that every JavaScript developer should know by Rachel Appel at FrontCo...
Things that every JavaScript developer should know by Rachel Appel at FrontCo...DevClub_lv
 
In the Trenches During a Software Supply Chain Attack by Mitch Denny at Front...
In the Trenches During a Software Supply Chain Attack by Mitch Denny at Front...In the Trenches During a Software Supply Chain Attack by Mitch Denny at Front...
In the Trenches During a Software Supply Chain Attack by Mitch Denny at Front...DevClub_lv
 
Software Decision Making in Terms of Uncertainty by Ziv Levy at FrontCon 2019
Software Decision Making in Terms of Uncertainty by Ziv Levy at FrontCon 2019Software Decision Making in Terms of Uncertainty by Ziv Levy at FrontCon 2019
Software Decision Making in Terms of Uncertainty by Ziv Levy at FrontCon 2019DevClub_lv
 
V8 by example: A journey through the compilation pipeline by Ujjwas Sharma at...
V8 by example: A journey through the compilation pipeline by Ujjwas Sharma at...V8 by example: A journey through the compilation pipeline by Ujjwas Sharma at...
V8 by example: A journey through the compilation pipeline by Ujjwas Sharma at...DevClub_lv
 
Bridging the gap between UX and development - A Storybook by Marko Letic at F...
Bridging the gap between UX and development - A Storybook by Marko Letic at F...Bridging the gap between UX and development - A Storybook by Marko Letic at F...
Bridging the gap between UX and development - A Storybook by Marko Letic at F...DevClub_lv
 
Case-study: Frontend in Cybersecurity by Ruslan Zavacky by FrontCon 2019
Case-study: Frontend in Cybersecurity by Ruslan Zavacky by FrontCon 2019Case-study: Frontend in Cybersecurity by Ruslan Zavacky by FrontCon 2019
Case-study: Frontend in Cybersecurity by Ruslan Zavacky by FrontCon 2019DevClub_lv
 
Building next generation PWA e-commerce frontend by Raivis Dejus at FrontCon ...
Building next generation PWA e-commerce frontend by Raivis Dejus at FrontCon ...Building next generation PWA e-commerce frontend by Raivis Dejus at FrontCon ...
Building next generation PWA e-commerce frontend by Raivis Dejus at FrontCon ...DevClub_lv
 
Parcel – your next web application bundler? by Janis Koselevs at FrontCon 2019
Parcel – your next web application bundler? by Janis Koselevs at FrontCon 2019Parcel – your next web application bundler? by Janis Koselevs at FrontCon 2019
Parcel – your next web application bundler? by Janis Koselevs at FrontCon 2019DevClub_lv
 
Managing State in React Apps with RxJS by James Wright at FrontCon 2019
Managing State in React Apps with RxJS by James Wright at FrontCon 2019Managing State in React Apps with RxJS by James Wright at FrontCon 2019
Managing State in React Apps with RxJS by James Wright at FrontCon 2019DevClub_lv
 
AAA 3D GRAPHICS ON THE WEB WITH REACTJS + BABYLONJS + UNITY3D by Denis Radin ...
AAA 3D GRAPHICS ON THE WEB WITH REACTJS + BABYLONJS + UNITY3D by Denis Radin ...AAA 3D GRAPHICS ON THE WEB WITH REACTJS + BABYLONJS + UNITY3D by Denis Radin ...
AAA 3D GRAPHICS ON THE WEB WITH REACTJS + BABYLONJS + UNITY3D by Denis Radin ...DevClub_lv
 

More from DevClub_lv (20)

"Infrastructure and AWS at Scale: The story of Posti" by Goran Gjorgievski @ ...
"Infrastructure and AWS at Scale: The story of Posti" by Goran Gjorgievski @ ..."Infrastructure and AWS at Scale: The story of Posti" by Goran Gjorgievski @ ...
"Infrastructure and AWS at Scale: The story of Posti" by Goran Gjorgievski @ ...
 
From 50 to 500 product engineers – data-driven approach to building impactful...
From 50 to 500 product engineers – data-driven approach to building impactful...From 50 to 500 product engineers – data-driven approach to building impactful...
From 50 to 500 product engineers – data-driven approach to building impactful...
 
Why is it so complex to accept a payment? by Dmitry Buzdin from A-Heads Consu...
Why is it so complex to accept a payment? by Dmitry Buzdin from A-Heads Consu...Why is it so complex to accept a payment? by Dmitry Buzdin from A-Heads Consu...
Why is it so complex to accept a payment? by Dmitry Buzdin from A-Heads Consu...
 
Do we need DDD? by Jurijs Čudnovskis from “Craftsmans Passion” at Fintech foc...
Do we need DDD? by Jurijs Čudnovskis from “Craftsmans Passion” at Fintech foc...Do we need DDD? by Jurijs Čudnovskis from “Craftsmans Passion” at Fintech foc...
Do we need DDD? by Jurijs Čudnovskis from “Craftsmans Passion” at Fintech foc...
 
Network security with Azure PaaS services by Erwin Staal from 4DotNet at Azur...
Network security with Azure PaaS services by Erwin Staal from 4DotNet at Azur...Network security with Azure PaaS services by Erwin Staal from 4DotNet at Azur...
Network security with Azure PaaS services by Erwin Staal from 4DotNet at Azur...
 
Using Azure Managed Identities for your App Services by Jan de Vries from 4Do...
Using Azure Managed Identities for your App Services by Jan de Vries from 4Do...Using Azure Managed Identities for your App Services by Jan de Vries from 4Do...
Using Azure Managed Identities for your App Services by Jan de Vries from 4Do...
 
SRE (service reliability engineer) on big DevOps platform running on the clou...
SRE (service reliability engineer) on big DevOps platform running on the clou...SRE (service reliability engineer) on big DevOps platform running on the clou...
SRE (service reliability engineer) on big DevOps platform running on the clou...
 
Emergence of IOT & Cloud – Azure by Narendra Sharma at Cloud focused 76th Dev...
Emergence of IOT & Cloud – Azure by Narendra Sharma at Cloud focused 76th Dev...Emergence of IOT & Cloud – Azure by Narendra Sharma at Cloud focused 76th Dev...
Emergence of IOT & Cloud – Azure by Narendra Sharma at Cloud focused 76th Dev...
 
Cross Platform Mobile Development using Flutter by Wei Meng Lee at Mobile foc...
Cross Platform Mobile Development using Flutter by Wei Meng Lee at Mobile foc...Cross Platform Mobile Development using Flutter by Wei Meng Lee at Mobile foc...
Cross Platform Mobile Development using Flutter by Wei Meng Lee at Mobile foc...
 
Building resilient frontend architecture by Monica Lent at FrontCon 2019
Building resilient frontend architecture by Monica Lent at FrontCon 2019Building resilient frontend architecture by Monica Lent at FrontCon 2019
Building resilient frontend architecture by Monica Lent at FrontCon 2019
 
Things that every JavaScript developer should know by Rachel Appel at FrontCo...
Things that every JavaScript developer should know by Rachel Appel at FrontCo...Things that every JavaScript developer should know by Rachel Appel at FrontCo...
Things that every JavaScript developer should know by Rachel Appel at FrontCo...
 
In the Trenches During a Software Supply Chain Attack by Mitch Denny at Front...
In the Trenches During a Software Supply Chain Attack by Mitch Denny at Front...In the Trenches During a Software Supply Chain Attack by Mitch Denny at Front...
In the Trenches During a Software Supply Chain Attack by Mitch Denny at Front...
 
Software Decision Making in Terms of Uncertainty by Ziv Levy at FrontCon 2019
Software Decision Making in Terms of Uncertainty by Ziv Levy at FrontCon 2019Software Decision Making in Terms of Uncertainty by Ziv Levy at FrontCon 2019
Software Decision Making in Terms of Uncertainty by Ziv Levy at FrontCon 2019
 
V8 by example: A journey through the compilation pipeline by Ujjwas Sharma at...
V8 by example: A journey through the compilation pipeline by Ujjwas Sharma at...V8 by example: A journey through the compilation pipeline by Ujjwas Sharma at...
V8 by example: A journey through the compilation pipeline by Ujjwas Sharma at...
 
Bridging the gap between UX and development - A Storybook by Marko Letic at F...
Bridging the gap between UX and development - A Storybook by Marko Letic at F...Bridging the gap between UX and development - A Storybook by Marko Letic at F...
Bridging the gap between UX and development - A Storybook by Marko Letic at F...
 
Case-study: Frontend in Cybersecurity by Ruslan Zavacky by FrontCon 2019
Case-study: Frontend in Cybersecurity by Ruslan Zavacky by FrontCon 2019Case-study: Frontend in Cybersecurity by Ruslan Zavacky by FrontCon 2019
Case-study: Frontend in Cybersecurity by Ruslan Zavacky by FrontCon 2019
 
Building next generation PWA e-commerce frontend by Raivis Dejus at FrontCon ...
Building next generation PWA e-commerce frontend by Raivis Dejus at FrontCon ...Building next generation PWA e-commerce frontend by Raivis Dejus at FrontCon ...
Building next generation PWA e-commerce frontend by Raivis Dejus at FrontCon ...
 
Parcel – your next web application bundler? by Janis Koselevs at FrontCon 2019
Parcel – your next web application bundler? by Janis Koselevs at FrontCon 2019Parcel – your next web application bundler? by Janis Koselevs at FrontCon 2019
Parcel – your next web application bundler? by Janis Koselevs at FrontCon 2019
 
Managing State in React Apps with RxJS by James Wright at FrontCon 2019
Managing State in React Apps with RxJS by James Wright at FrontCon 2019Managing State in React Apps with RxJS by James Wright at FrontCon 2019
Managing State in React Apps with RxJS by James Wright at FrontCon 2019
 
AAA 3D GRAPHICS ON THE WEB WITH REACTJS + BABYLONJS + UNITY3D by Denis Radin ...
AAA 3D GRAPHICS ON THE WEB WITH REACTJS + BABYLONJS + UNITY3D by Denis Radin ...AAA 3D GRAPHICS ON THE WEB WITH REACTJS + BABYLONJS + UNITY3D by Denis Radin ...
AAA 3D GRAPHICS ON THE WEB WITH REACTJS + BABYLONJS + UNITY3D by Denis Radin ...
 

Recently uploaded

"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr BaganFwdays
 
Powerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time ClashPowerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time Clashcharlottematthew16
 
Story boards and shot lists for my a level piece
Story boards and shot lists for my a level pieceStory boards and shot lists for my a level piece
Story boards and shot lists for my a level piececharlottematthew16
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Commit University
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebUiPathCommunity
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek SchlawackFwdays
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsMark Billinghurst
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsSergiu Bodiu
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Mattias Andersson
 
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 3652toLead Limited
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 3652toLead Limited
 
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Patryk Bandurski
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsRizwan Syed
 
costume and set research powerpoint presentation
costume and set research powerpoint presentationcostume and set research powerpoint presentation
costume and set research powerpoint presentationphoebematthew05
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubKalema Edgar
 
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks..."LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...Fwdays
 
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticsKotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticscarlostorres15106
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfAddepto
 
My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024The Digital Insurer
 

Recently uploaded (20)

"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan
 
Powerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time ClashPowerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time Clash
 
Story boards and shot lists for my a level piece
Story boards and shot lists for my a level pieceStory boards and shot lists for my a level piece
Story boards and shot lists for my a level piece
 
Hot Sexy call girls in Panjabi Bagh 🔝 9953056974 🔝 Delhi escort Service
Hot Sexy call girls in Panjabi Bagh 🔝 9953056974 🔝 Delhi escort ServiceHot Sexy call girls in Panjabi Bagh 🔝 9953056974 🔝 Delhi escort Service
Hot Sexy call girls in Panjabi Bagh 🔝 9953056974 🔝 Delhi escort Service
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio Web
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR Systems
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platforms
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?
 
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365
 
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL Certs
 
costume and set research powerpoint presentation
costume and set research powerpoint presentationcostume and set research powerpoint presentation
costume and set research powerpoint presentation
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding Club
 
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks..."LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
 
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticsKotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdf
 
My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024
 

Fine-tuning Large Language Models by Dmitry Balabka

  • 1. Fine-tuning Large Language Model for data-to-text generation
  • 2. • Lead ML Engineer @ Ecentria Group / Intexsys • I'm working on a large US e-commerce project • M.Sc. in Computer Science @ TSI • +10 years experience in Software Engineering 2
  • 3. Agenda • LLM fine-tuning introduction • Data-to-text generation for specific business case • Conclusions • Q&A 3
  • 4. Introduction to LLM fine-tuning 4
  • 5. Zero/One/Few-shot learning • Zero-shot learning • Mean providing a prompt that isn’t a part of the training data • Example: asking open questions to model • One/Few-shot learning • Provide one or few examples as a part of the prompt • Example: asking the model to format the text and providing a few examples • Prompt engineering 5 Q: What is the title of this section? A: Introduction to LLM fine- tuning
  • 6. What is fine-tuning? In deep learning, fine-tuning is an approach to transfer learning in which the weights of a pre-trained model are trained on new data. https://en.wikipedia.org/wiki/Fine-tuning_(deep_learning) 6 This Photo by Unknown Author is licensed under CC BY-SA
  • 7. When do you need to fine-tune the model? • Prompt engineering did not work out. • Retrieval augmented generation (RAG) didn’t work out. • Highly qualitative data for training is available. • Cost is not a problem. • It is clear how to measure that result. • Read more: • https://learn.microsoft.com/en-us/azure/ai-services/openai/concepts/fine- tuning-considerations • https://platform.openai.com/docs/guides/fine-tuning/when-to-use-fine- tuning 7
  • 8. Difference between pretraining and fine-tuning 1/2 Pre-training Fine-tuning Training time Weeks Hours Compute Thousands of GPUs One or few GPUs Dataset Terabytes (e.g., C4, Pile) 100-1000 MB Budget $ Millions $ Hundreds 8 LLM Pretraining Pre-trained LLM Fine-tuning Huge datasets and a lot of compute Small dataset and one or few GPUs
  • 9. Difference between pretraining and fine-tuning 2/2 Source: https://blog.research.google/2020/02/exploring-transfer-learning-with-t5.html Terabytes of input data Thousands of examples Unsupervised learning Supervised learning 9
  • 10. Fine-tuning methods • Full fine-tuning continues the initial training of the model using the existing checkpoint. • PEFT (Parameter-Efficient Fine-Tuning) methods only fine-tune a small number of (extra) model parameters - significantly decreasing computational and storage costs - while yielding performance comparable to a fully fine-tuned model. • LoRA is a way to train large models efficiently by inserting (typically in the attention blocks) smaller trainable matrices to be learnt during finetuning. • Prompt-based methods (p-tuning, prefix tuning, prompt tuning). Instead of manually creating hard (text) prompts, soft prompting methods are applied by adding learnable parameters to the input embeddings that can be optimized for a specific task while keeping the pre-trained model’s parameters frozen. 10 Read more: https://huggingface.co/docs/peft/index
  • 11. Pre-trained open-source LLMs • Consider licenses that allow commercial use cases. • A larger LLM has greater capabilities, but it also requires higher computing resources. • A larger context window allows adding more information into context. • Most of the attractive models: • Mistral with 7B params, 4096 tokens and 16K sliding window, Apache License 2.0 • Gemma with 7B params, 8192 tokens, Google’s Gemma Terms of use 11 Read mode: https://github.com/eugeneyan/open-llms
  • 12. Libraries for fine-tuning Library name Company Popularit y ⭐ PEF T DL Framework Supported LLM models Links Deep Speed Microsoft 31.5k ✅ PyTorch A lot docs, github PEFT HuggingFace🤗 12.7k ✅ PyTorch LLaMA, Mistral, T5, GPT, others blog, github, docs Accelerate HuggingFace🤗 6.6k ✖️ PyTorch A lot github, docs NeMo Nvidia 9.4k ✅ PyTorch LLaMA, Falcon, T5, GPT, others docs, github T5X Google 2.3k ❔ JAX T5 and some others, PaLM* paper, github, docs Paxml Google 0.3k ❔ JAX PaLM-2* docs, github 12
  • 13. Supervised fine-tuning in clouds 13 Cloud LLM Model Azure GPT, Llama AWS Bedrock Amazon Titan, Anthropic Cloude, Cohere Command, Meta Llama [link] GCP Vertex AI* PaLM 🌴, Gemma, T5, Gemini**, Llama OpenAI Platform GPT Anthropic Claude Cohere Command MosaicML MPT * - supports RLHF ** - coming soon
  • 14. Hardware for industrial needs 14 Read more: https://khairy2011.medium.com/tpu-vs-gpu-vs-cerebras-vs- graphcore-a-fair-comparison-between-ml-hardware-3f5a19d89e38 • Nvidia GPU: • H100 up to 80GB RAM • Supports any framework • Available in any cloud • Requires NVLink/NVSwitch for efficient data/model parallelism • On-prem possibility • Google TPU • More const efficient • V3-8 up to 128GB RAM • Support XLA only: Jax, PyTorch/XLA, TF • GCP lock • Supports data/model parallelism out-of-the- box
  • 15. Other important topics • Inference performance optimization by reducing memory footprint and improving parallelizability. • Efficient attention using lower-level hardware-aware optimizations (e.g., Flash Attention) • Quantization by reducing the computational precision of weights and activations • Mixture of Experts to decrease inference time by not using all experts at once. • and others. • Hallucinations solutions • Retrieval-augmented generation (RAG) • Misleading behavior solutions • Reinforcement Learning From Human Feedback (RLHF) Kaddour, J., Harris, J., Mozes, M., Bradley, H., Raileanu, R. and McHardy, R., 2023. Challenges and applications of large language models. arXiv preprint arXiv:2307.10169. 15
  • 17. Data-to-Text generation 17 { "city": "San Francisco", "date": "2024-02-28", "temperature": { "high": 68, "low": 51 }, "conditions": "Sunny", "wind_speed": 5, "humidity": 72 } LLM Today's weather report for San Francisco on February 28, 2024, indicates a sunny day with a high of 68°F and a low of 51°F. Expect mild wind speeds around 5 mph and a humidity level of 72%. It's a beautiful day to be outdoors!
  • 18. Out setup • Model: T5 3B, (encoder-decoder arch.) • Library: T5X • Hardware: single TPU v3.8 128GB (enough to fit T5 3B into single TPU unit) • Cloud: GCP (we can migrate model training to GPUs) • Dataset: 300 000 train and 50 000 for A/B testing • Supervised Full Fine-tuning. 18
  • 19. Why T5 in 2021/2022? • GPT-2/J/Neo vs T5. • T5X mature fine-tuning library. • Large models up to 11B. • Cost-efficient training using TPUs and JAX. • Apache 2.0 license. • Llama and most other popular open models were released later (today, 2024) 19 GPT-2 T5
  • 20. Training task definition 20 City: San Francisco Date: 2024-02-28 Temperature: 68, 51 Conditions: Sunny Wind speed: 5 Humidity: 72 T5 3B Today's weather report for San Francisco on February 28, 2024, indicates a sunny day with a high of 68°F and a low of 51°F. Expect mild wind speeds around 5 mph and a humidity level of 72%. It's a beautiful day to be outdoors! City: San Francisco Date: 2024-02-28 Temperature: 68, 51 Conditions: Sunny Wind speed: 5 Humidity: 72 City: San Francisco Date: 2024-02-28 Temperature: 68, 51 Conditions: Sunny Wind speed: 5 Humidity: 72 Today's weather report for San Francisco on February 28, 2024, indicates a sunny day with a high of 68°F and a low of 51°F. Expect mild wind speeds around 5 mph and a humidity level of 72%. It's a beautiful day to be outdoors! Today's weather report for San Francisco on February 28, 2024, indicates a sunny day with a high of 68°F and a low of 51°F. Expect mild wind speeds around 5 mph and a humidity level of 72%. It's a beautiful day to be outdoors! 300,000 training pairs of input and output
  • 21. Problems with smaller models • Problems (in both the smallest size GPT and the smallest size T5 60m params): • Not natural-sounding enough • Too short • Texts have “errors” (e.g., wrong number values) • Model used “forbidden phrases” • “Messy” text: e.g., same phrases repeating twice within the same text, sometimes in a row; incomplete sentences/brief expressions without context at the end of the text, etc. 21
  • 22. Solution • Use larger model T5 3B • Additional manipulations with input data: • Cleaned up the text by removing irrelevant information. • Supplying “forbidden phrases” as inputs to avoid using them liberally by the model. • Providing explicit labels in input prompt: • “Low quality” – helps the model to distinguish the text quality. • “Too short” – helps the model to distinguish between short and long descriptions. • Text clustering using Locality-sensitive hashing (LSH) to find similar texts. 22
  • 23. Regularize the model via prompting 23 23 City: San Francisco Date: 2024-02-28 Temperature: 68, 51 Conditions: Sunny Wind speed: 5 Humidity: 72 Less formal LLM Today's weather report for San Francisco on February 28, 2024, indicates a sunny day with a high of 68°F and a low of 51°F. Expect mild wind speeds around 5 mph and a humidity level of 72%. It's a beautiful day to be outdoors! City: San Francisco Date: 2024-02-28 Temperature: 68, 51 Conditions: Sunny Wind speed: 5 Humidity: 72 LLM Today's weather report for San Francisco on February 28, 2024, indicates a sunny day with a high of 68°F and a low of 51°F. Expect mild wind speeds around 5 mph and a humidity level of 72%. Training Generation Model learns the relationship between the input prompt and the required generated text.
  • 24. Model evaluation by size 24 70% 75% 80% 85% 90% 95% Small XL Text w/o “forbidden words” (higher is better) 0 5 10 15 20 25 - 0 0 - 10 10 - 20 20 - 30 30 - 40 40 - 50 50 - 60 60 - 70 70 - 80 80 - 90 90 - 100 100 - 110 110 - 120 120 - 130 130 - 140 140 - 150 150 - 160 160 - 170 170 - 180 180 - 190 190 - 200 200 - 210 210 - Number of examples Length distribution (words) T5 small T5 XL
  • 25. Results evaluation • Custom metrics. • Example. Numbers of occurrences in the generated text: • forbidden words • unseen numbers • Expert verification. • Experts reviewed generated texts. • Set of known examples that were monitored. • A/B testing validation. 25
  • 26. Conclusions • While getting into the LLM field has a very steep learning curve, collecting datasets is the most important and time-consuming part. • Larger models provide better results (e.g., 1B vs 3B). However, larger models require more ML engineering efforts. • Fine-tuning requires very high-quality data. 26
  • 29. Anthropic’s RLHF dataset example 29 Source https://huggingface.co/datasets/Anthropic/hh-rlhf?row=0
  • 30. Different usages of the models • Prediction using zero-shot learning • No need for additional training • Very large models are required (>=100 billion parameters) • Not-so-precise prediction • Simple, prompt engineering is required • It is almost impossible to influence the model’s generated text quality • Few examples are required • Prediction using fine-tunned model • Requires model’s additional training (might take from hours to days) • Not-so-large models are required (<11 billion parameters) • More precise predictions • Advanced knowledge is required to fine-tune the model • Adjustable generated text quality by improving the dataset for fine-tuning • >=10 000 examples are required 30
  • 31. Using fine-tunned models Choose cloud or on-prem pre-trained model Fine-tuning Use model for prediction Choose cloud or on-prem pre-trained model Prompt engineering Use model for prediction Piece of cake Challenging Better generated text quality Not so precise generated text Using models with Zero-shot learning 31