SlideShare a Scribd company logo
1 of 31
Download to read offline
Running LLM in Kubernetes
Volodymyr Tsap
CTO @ SHALB
What is LLM?
A large language model (LLM) is a language model notable for its ability to achieve general-purpose language
generation and understanding.
LLMs acquire these abilities by learning statistical relationships from text documents during a computationally
intensive self-supervised and semi-supervised training process.
LLMs are artificial neural networks, the largest and most capable of which are built with a transformer-based
architecture.
Wikipedia
What is Transformers?
! Transformers are a type of deep learning model that have revolutionized the way natural
language processing tasks are approached.
! Transformers utilize a unique architecture that relies on self-attention mechanisms to weigh
the significance of different words in a sentence. This allows the model to capture the context of
each word more effectively than previous models, leading to better understanding and
generation of text.
Building LLM. Data Collection and Preparation.
! Collect a large and diverse dataset from various sources such as books, websites, and other
texts.
! Clean and preprocess the data to remove irrelevant content, normalize text (e.g., lowercasing,
removing special characters), and ensure data quality.
Building LLM. Tokenization and Vocabulary Building.
! Tokenize the text data into smaller units (tokens) such as words, subwords, or characters. This
step may involve choosing a specific tokenization algorithm (e.g., BPE, WordPiece).
! Create a vocabulary of unique tokens and possibly generate embeddings for them. This could
involve pre-training embeddings or using embeddings from an existing model.
Building LLM. Model Architecture Design.
! Choose a transformer architecture (e.g., GPT, BERT) that suits the goals of your LLM. This
involves deciding on the number of layers, attention heads, and other hyperparameters.
! Implement or adapt an existing transformer model framework using deep learning libraries such
as TensorFlow or PyTorch.
Building LLM. Model Architecture Design.
Building LLM. Training.
! Split the data into training, validation, and test sets.
! Pre-train the model on the collected data, which involves running it through the computation of
weights over multiple epochs. This step is computationally intensive and can take from hours to
weeks depending on the model size and hardware capabilities.
! Use techniques such as gradient clipping, learning rate scheduling, and regularization to
improve training efficiency and model performance.
Building LLM. Fine-Tuning (Optional).
! Fine-tune the pre-trained model on a smaller, task-specific dataset if the LLM will be used for
specific applications (e.g., question answering, sentiment analysis).
! Adjust hyperparameters and training settings to optimize performance for the target task.
Building LLM. Evaluation and Testing.
! Evaluate the model on a test set to measure its performance using appropriate metrics (e.g.,
accuracy, F1 score, perplexity).
! Perform error analysis and adjust the training process as necessary to improve model quality.
Building LLM. Saving and Deployment.
! Save the trained model weights and configuration to files.
! Deploy the model for inference, which can involve setting up a serving infrastructure capable of
handling requests in real-time or batch processing.
TLDR. Watch Andrej Karpathy Explanation.
Hugging Face - GitHub for LLM’s
LLM Files
LLM Files
How to run? Using Google Colab with T4 gpu
How to run? Using laptop and llama.cpp. Quantization.
Using Managed Cloud Services.
! Amazon SageMaker
! Google Cloud AI Platform & Vertex AI
! Microsoft Azure Machine Learning
! NVIDIA AI Enterprise
! Hugging Face Endpoints
! AnyScale Endpoints
Why to run them in Kubernetes?
1. We already know him :)
2. Scalability. Resource efficiency, HPA, auto-scaling, API Limits, etc..
3. Price. Managed service 20-40% overhead. Reserved instances.
4. GPU sharing.
5. ML ecosystem - pipelines, artifacts. (KubeFlow, Ray Framework).
6. No vendor lock. Transportable.
LLM Serving Frameworks
Options to run LLM on K8s.
1. KServe from Kubeflow.
2. Ray Serve from Ray Framework.
3. Flux AI controller.
4. Own Kubernetes wrapper on top of Frameworks.
We choose TGI and made it Kubernetes ready.
We have Docker, Lets adapt it to Kubernetes
Demo Time!
Let’s bootstrap infrastructure from cluster.dev template
Then add model configuration
Apply and check the model is running
Changing models and infrastructure
Enabling HF chat-ui
Deploy Monitoring and Metrics with DCGM Exporter
Thank you! Now Questions?

More Related Content

Similar to "Running Open-Source LLM models on Kubernetes", Volodymyr Tsap

DLT UNIT-3.docx
DLT  UNIT-3.docxDLT  UNIT-3.docx
DLT UNIT-3.docx0567Padma
 
Vertex AI - Unified ML Platform for the entire AI workflow on Google Cloud
Vertex AI - Unified ML Platform for the entire AI workflow on Google CloudVertex AI - Unified ML Platform for the entire AI workflow on Google Cloud
Vertex AI - Unified ML Platform for the entire AI workflow on Google CloudMárton Kodok
 
Norman Sasono - Incorporating AI/ML into Your Application Architecture
Norman Sasono - Incorporating AI/ML into Your Application ArchitectureNorman Sasono - Incorporating AI/ML into Your Application Architecture
Norman Sasono - Incorporating AI/ML into Your Application ArchitectureAgile Impact Conference
 
Norman Sasono - Incorporating AI/ML into Your Application Architecture
Norman Sasono - Incorporating AI/ML into Your Application ArchitectureNorman Sasono - Incorporating AI/ML into Your Application Architecture
Norman Sasono - Incorporating AI/ML into Your Application ArchitectureAgile Impact
 
MLFlow: Platform for Complete Machine Learning Lifecycle
MLFlow: Platform for Complete Machine Learning Lifecycle MLFlow: Platform for Complete Machine Learning Lifecycle
MLFlow: Platform for Complete Machine Learning Lifecycle Databricks
 
MY NEWEST RESUME
MY NEWEST RESUMEMY NEWEST RESUME
MY NEWEST RESUMEHan Yan
 
AutoML - Heralding a New Era of Machine Learning - CASOUG Oct 2021
AutoML - Heralding a New Era of Machine Learning - CASOUG Oct 2021AutoML - Heralding a New Era of Machine Learning - CASOUG Oct 2021
AutoML - Heralding a New Era of Machine Learning - CASOUG Oct 2021Sandesh Rao
 
Scaling up Machine Learning Development
Scaling up Machine Learning DevelopmentScaling up Machine Learning Development
Scaling up Machine Learning DevelopmentMatei Zaharia
 
C19013010 the tutorial to build shared ai services session 1
C19013010  the tutorial to build shared ai services session 1C19013010  the tutorial to build shared ai services session 1
C19013010 the tutorial to build shared ai services session 1Bill Liu
 
Adopting AnswerModules ModuleSuite
Adopting AnswerModules ModuleSuiteAdopting AnswerModules ModuleSuite
Adopting AnswerModules ModuleSuiteAnswerModules
 
Paige Roberts: Shortcut MLOps with In-Database Machine Learning
Paige Roberts: Shortcut MLOps with In-Database Machine LearningPaige Roberts: Shortcut MLOps with In-Database Machine Learning
Paige Roberts: Shortcut MLOps with In-Database Machine LearningEdunomica
 
Best Practices with OLAP Modeling with Cognos Transformer (Cognos 8)
Best Practices with OLAP Modeling with Cognos Transformer (Cognos 8)Best Practices with OLAP Modeling with Cognos Transformer (Cognos 8)
Best Practices with OLAP Modeling with Cognos Transformer (Cognos 8)Senturus
 
Automate your Job and Business with ChatGPT #3 - Fundamentals of LLM/GPT
Automate your Job and Business with ChatGPT #3 - Fundamentals of LLM/GPTAutomate your Job and Business with ChatGPT #3 - Fundamentals of LLM/GPT
Automate your Job and Business with ChatGPT #3 - Fundamentals of LLM/GPTAnant Corporation
 
MLOps Virtual Event | Building Machine Learning Platforms for the Full Lifecycle
MLOps Virtual Event | Building Machine Learning Platforms for the Full LifecycleMLOps Virtual Event | Building Machine Learning Platforms for the Full Lifecycle
MLOps Virtual Event | Building Machine Learning Platforms for the Full LifecycleDatabricks
 
DevBCN Vertex AI - Pipelines for your MLOps workflows
DevBCN Vertex AI - Pipelines for your MLOps workflowsDevBCN Vertex AI - Pipelines for your MLOps workflows
DevBCN Vertex AI - Pipelines for your MLOps workflowsMárton Kodok
 
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...DataWorks Summit
 
An introduction to the MDA
An introduction to the MDAAn introduction to the MDA
An introduction to the MDALai Ha
 
Deep AutoViML For Tensorflow Models and MLOps Workflows
Deep AutoViML For Tensorflow Models and MLOps WorkflowsDeep AutoViML For Tensorflow Models and MLOps Workflows
Deep AutoViML For Tensorflow Models and MLOps WorkflowsBill Liu
 
openGauss - The evolution route of openGauss' AIcapabilities
openGauss - The evolution route of openGauss' AIcapabilitiesopenGauss - The evolution route of openGauss' AIcapabilities
openGauss - The evolution route of openGauss' AIcapabilitieswot chin
 
Slides-Артем Коваль-Cloud-Native MLOps Framework - DataFest 2021.pdf
Slides-Артем Коваль-Cloud-Native MLOps Framework - DataFest 2021.pdfSlides-Артем Коваль-Cloud-Native MLOps Framework - DataFest 2021.pdf
Slides-Артем Коваль-Cloud-Native MLOps Framework - DataFest 2021.pdfvitm11
 

Similar to "Running Open-Source LLM models on Kubernetes", Volodymyr Tsap (20)

DLT UNIT-3.docx
DLT  UNIT-3.docxDLT  UNIT-3.docx
DLT UNIT-3.docx
 
Vertex AI - Unified ML Platform for the entire AI workflow on Google Cloud
Vertex AI - Unified ML Platform for the entire AI workflow on Google CloudVertex AI - Unified ML Platform for the entire AI workflow on Google Cloud
Vertex AI - Unified ML Platform for the entire AI workflow on Google Cloud
 
Norman Sasono - Incorporating AI/ML into Your Application Architecture
Norman Sasono - Incorporating AI/ML into Your Application ArchitectureNorman Sasono - Incorporating AI/ML into Your Application Architecture
Norman Sasono - Incorporating AI/ML into Your Application Architecture
 
Norman Sasono - Incorporating AI/ML into Your Application Architecture
Norman Sasono - Incorporating AI/ML into Your Application ArchitectureNorman Sasono - Incorporating AI/ML into Your Application Architecture
Norman Sasono - Incorporating AI/ML into Your Application Architecture
 
MLFlow: Platform for Complete Machine Learning Lifecycle
MLFlow: Platform for Complete Machine Learning Lifecycle MLFlow: Platform for Complete Machine Learning Lifecycle
MLFlow: Platform for Complete Machine Learning Lifecycle
 
MY NEWEST RESUME
MY NEWEST RESUMEMY NEWEST RESUME
MY NEWEST RESUME
 
AutoML - Heralding a New Era of Machine Learning - CASOUG Oct 2021
AutoML - Heralding a New Era of Machine Learning - CASOUG Oct 2021AutoML - Heralding a New Era of Machine Learning - CASOUG Oct 2021
AutoML - Heralding a New Era of Machine Learning - CASOUG Oct 2021
 
Scaling up Machine Learning Development
Scaling up Machine Learning DevelopmentScaling up Machine Learning Development
Scaling up Machine Learning Development
 
C19013010 the tutorial to build shared ai services session 1
C19013010  the tutorial to build shared ai services session 1C19013010  the tutorial to build shared ai services session 1
C19013010 the tutorial to build shared ai services session 1
 
Adopting AnswerModules ModuleSuite
Adopting AnswerModules ModuleSuiteAdopting AnswerModules ModuleSuite
Adopting AnswerModules ModuleSuite
 
Paige Roberts: Shortcut MLOps with In-Database Machine Learning
Paige Roberts: Shortcut MLOps with In-Database Machine LearningPaige Roberts: Shortcut MLOps with In-Database Machine Learning
Paige Roberts: Shortcut MLOps with In-Database Machine Learning
 
Best Practices with OLAP Modeling with Cognos Transformer (Cognos 8)
Best Practices with OLAP Modeling with Cognos Transformer (Cognos 8)Best Practices with OLAP Modeling with Cognos Transformer (Cognos 8)
Best Practices with OLAP Modeling with Cognos Transformer (Cognos 8)
 
Automate your Job and Business with ChatGPT #3 - Fundamentals of LLM/GPT
Automate your Job and Business with ChatGPT #3 - Fundamentals of LLM/GPTAutomate your Job and Business with ChatGPT #3 - Fundamentals of LLM/GPT
Automate your Job and Business with ChatGPT #3 - Fundamentals of LLM/GPT
 
MLOps Virtual Event | Building Machine Learning Platforms for the Full Lifecycle
MLOps Virtual Event | Building Machine Learning Platforms for the Full LifecycleMLOps Virtual Event | Building Machine Learning Platforms for the Full Lifecycle
MLOps Virtual Event | Building Machine Learning Platforms for the Full Lifecycle
 
DevBCN Vertex AI - Pipelines for your MLOps workflows
DevBCN Vertex AI - Pipelines for your MLOps workflowsDevBCN Vertex AI - Pipelines for your MLOps workflows
DevBCN Vertex AI - Pipelines for your MLOps workflows
 
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
 
An introduction to the MDA
An introduction to the MDAAn introduction to the MDA
An introduction to the MDA
 
Deep AutoViML For Tensorflow Models and MLOps Workflows
Deep AutoViML For Tensorflow Models and MLOps WorkflowsDeep AutoViML For Tensorflow Models and MLOps Workflows
Deep AutoViML For Tensorflow Models and MLOps Workflows
 
openGauss - The evolution route of openGauss' AIcapabilities
openGauss - The evolution route of openGauss' AIcapabilitiesopenGauss - The evolution route of openGauss' AIcapabilities
openGauss - The evolution route of openGauss' AIcapabilities
 
Slides-Артем Коваль-Cloud-Native MLOps Framework - DataFest 2021.pdf
Slides-Артем Коваль-Cloud-Native MLOps Framework - DataFest 2021.pdfSlides-Артем Коваль-Cloud-Native MLOps Framework - DataFest 2021.pdf
Slides-Артем Коваль-Cloud-Native MLOps Framework - DataFest 2021.pdf
 

More from Fwdays

"How Ukrainian IT specialist can go on vacation abroad without crossing the T...
"How Ukrainian IT specialist can go on vacation abroad without crossing the T..."How Ukrainian IT specialist can go on vacation abroad without crossing the T...
"How Ukrainian IT specialist can go on vacation abroad without crossing the T...Fwdays
 
"The Strength of Being Vulnerable: the experience from CIA, Tesla and Uber", ...
"The Strength of Being Vulnerable: the experience from CIA, Tesla and Uber", ..."The Strength of Being Vulnerable: the experience from CIA, Tesla and Uber", ...
"The Strength of Being Vulnerable: the experience from CIA, Tesla and Uber", ...Fwdays
 
"[QUICK TALK] Radical candor: how to achieve results faster thanks to a cultu...
"[QUICK TALK] Radical candor: how to achieve results faster thanks to a cultu..."[QUICK TALK] Radical candor: how to achieve results faster thanks to a cultu...
"[QUICK TALK] Radical candor: how to achieve results faster thanks to a cultu...Fwdays
 
"[QUICK TALK] PDP Plan, the only one door to raise your salary and boost care...
"[QUICK TALK] PDP Plan, the only one door to raise your salary and boost care..."[QUICK TALK] PDP Plan, the only one door to raise your salary and boost care...
"[QUICK TALK] PDP Plan, the only one door to raise your salary and boost care...Fwdays
 
"4 horsemen of the apocalypse of working relationships (+ antidotes to them)"...
"4 horsemen of the apocalypse of working relationships (+ antidotes to them)"..."4 horsemen of the apocalypse of working relationships (+ antidotes to them)"...
"4 horsemen of the apocalypse of working relationships (+ antidotes to them)"...Fwdays
 
"Reconnecting with Purpose: Rediscovering Job Interest after Burnout", Anast...
"Reconnecting with Purpose: Rediscovering Job Interest after Burnout",  Anast..."Reconnecting with Purpose: Rediscovering Job Interest after Burnout",  Anast...
"Reconnecting with Purpose: Rediscovering Job Interest after Burnout", Anast...Fwdays
 
"Mentoring 101: How to effectively invest experience in the success of others...
"Mentoring 101: How to effectively invest experience in the success of others..."Mentoring 101: How to effectively invest experience in the success of others...
"Mentoring 101: How to effectively invest experience in the success of others...Fwdays
 
"Mission (im) possible: How to get an offer in 2024?", Oleksandra Myronova
"Mission (im) possible: How to get an offer in 2024?",  Oleksandra Myronova"Mission (im) possible: How to get an offer in 2024?",  Oleksandra Myronova
"Mission (im) possible: How to get an offer in 2024?", Oleksandra MyronovaFwdays
 
"Why have we learned how to package products, but not how to 'package ourselv...
"Why have we learned how to package products, but not how to 'package ourselv..."Why have we learned how to package products, but not how to 'package ourselv...
"Why have we learned how to package products, but not how to 'package ourselv...Fwdays
 
"How to tame the dragon, or leadership with imposter syndrome", Oleksandr Zin...
"How to tame the dragon, or leadership with imposter syndrome", Oleksandr Zin..."How to tame the dragon, or leadership with imposter syndrome", Oleksandr Zin...
"How to tame the dragon, or leadership with imposter syndrome", Oleksandr Zin...Fwdays
 
"Leadership, Soft Skills, and Personality Types for IT teams", Sergiy Tytenko
"Leadership, Soft Skills, and Personality Types for IT teams",  Sergiy Tytenko"Leadership, Soft Skills, and Personality Types for IT teams",  Sergiy Tytenko
"Leadership, Soft Skills, and Personality Types for IT teams", Sergiy TytenkoFwdays
 
"How Ukrainian IT specialist can go on vacation abroad without crossing the T...
"How Ukrainian IT specialist can go on vacation abroad without crossing the T..."How Ukrainian IT specialist can go on vacation abroad without crossing the T...
"How Ukrainian IT specialist can go on vacation abroad without crossing the T...Fwdays
 
"Mastering cross-cultural communication", Anna Gandrabura
"Mastering cross-cultural communication", Anna Gandrabura"Mastering cross-cultural communication", Anna Gandrabura
"Mastering cross-cultural communication", Anna GandraburaFwdays
 
"Incremental rollouts and rollbacks with business metrics control at every st...
"Incremental rollouts and rollbacks with business metrics control at every st..."Incremental rollouts and rollbacks with business metrics control at every st...
"Incremental rollouts and rollbacks with business metrics control at every st...Fwdays
 
"AIRe - AI Reliability Engineering", Denys Vasyliev
"AIRe - AI Reliability Engineering", Denys Vasyliev"AIRe - AI Reliability Engineering", Denys Vasyliev
"AIRe - AI Reliability Engineering", Denys VasylievFwdays
 
"Testing of Helm Charts or There and Back Again", Yura Rochniak
"Testing of Helm Charts or There and Back Again", Yura Rochniak"Testing of Helm Charts or There and Back Again", Yura Rochniak
"Testing of Helm Charts or There and Back Again", Yura RochniakFwdays
 
"How to mentor (future) devops engineers", Vsevolod Polyakov
"How to mentor (future) devops engineers",  Vsevolod Polyakov"How to mentor (future) devops engineers",  Vsevolod Polyakov
"How to mentor (future) devops engineers", Vsevolod PolyakovFwdays
 
"Crisis to Calm: Incident Management’s Role in Business Stability", Oleksii O...
"Crisis to Calm: Incident Management’s Role in Business Stability", Oleksii O..."Crisis to Calm: Incident Management’s Role in Business Stability", Oleksii O...
"Crisis to Calm: Incident Management’s Role in Business Stability", Oleksii O...Fwdays
 
"DevOps Practisting Platform on EKS with Karpenter autoscaling", Dmytro Kozhevin
"DevOps Practisting Platform on EKS with Karpenter autoscaling", Dmytro Kozhevin"DevOps Practisting Platform on EKS with Karpenter autoscaling", Dmytro Kozhevin
"DevOps Practisting Platform on EKS with Karpenter autoscaling", Dmytro KozhevinFwdays
 
"Platform Engineering with Development Containers", Igor Fesenko
"Platform Engineering with Development Containers", Igor Fesenko"Platform Engineering with Development Containers", Igor Fesenko
"Platform Engineering with Development Containers", Igor FesenkoFwdays
 

More from Fwdays (20)

"How Ukrainian IT specialist can go on vacation abroad without crossing the T...
"How Ukrainian IT specialist can go on vacation abroad without crossing the T..."How Ukrainian IT specialist can go on vacation abroad without crossing the T...
"How Ukrainian IT specialist can go on vacation abroad without crossing the T...
 
"The Strength of Being Vulnerable: the experience from CIA, Tesla and Uber", ...
"The Strength of Being Vulnerable: the experience from CIA, Tesla and Uber", ..."The Strength of Being Vulnerable: the experience from CIA, Tesla and Uber", ...
"The Strength of Being Vulnerable: the experience from CIA, Tesla and Uber", ...
 
"[QUICK TALK] Radical candor: how to achieve results faster thanks to a cultu...
"[QUICK TALK] Radical candor: how to achieve results faster thanks to a cultu..."[QUICK TALK] Radical candor: how to achieve results faster thanks to a cultu...
"[QUICK TALK] Radical candor: how to achieve results faster thanks to a cultu...
 
"[QUICK TALK] PDP Plan, the only one door to raise your salary and boost care...
"[QUICK TALK] PDP Plan, the only one door to raise your salary and boost care..."[QUICK TALK] PDP Plan, the only one door to raise your salary and boost care...
"[QUICK TALK] PDP Plan, the only one door to raise your salary and boost care...
 
"4 horsemen of the apocalypse of working relationships (+ antidotes to them)"...
"4 horsemen of the apocalypse of working relationships (+ antidotes to them)"..."4 horsemen of the apocalypse of working relationships (+ antidotes to them)"...
"4 horsemen of the apocalypse of working relationships (+ antidotes to them)"...
 
"Reconnecting with Purpose: Rediscovering Job Interest after Burnout", Anast...
"Reconnecting with Purpose: Rediscovering Job Interest after Burnout",  Anast..."Reconnecting with Purpose: Rediscovering Job Interest after Burnout",  Anast...
"Reconnecting with Purpose: Rediscovering Job Interest after Burnout", Anast...
 
"Mentoring 101: How to effectively invest experience in the success of others...
"Mentoring 101: How to effectively invest experience in the success of others..."Mentoring 101: How to effectively invest experience in the success of others...
"Mentoring 101: How to effectively invest experience in the success of others...
 
"Mission (im) possible: How to get an offer in 2024?", Oleksandra Myronova
"Mission (im) possible: How to get an offer in 2024?",  Oleksandra Myronova"Mission (im) possible: How to get an offer in 2024?",  Oleksandra Myronova
"Mission (im) possible: How to get an offer in 2024?", Oleksandra Myronova
 
"Why have we learned how to package products, but not how to 'package ourselv...
"Why have we learned how to package products, but not how to 'package ourselv..."Why have we learned how to package products, but not how to 'package ourselv...
"Why have we learned how to package products, but not how to 'package ourselv...
 
"How to tame the dragon, or leadership with imposter syndrome", Oleksandr Zin...
"How to tame the dragon, or leadership with imposter syndrome", Oleksandr Zin..."How to tame the dragon, or leadership with imposter syndrome", Oleksandr Zin...
"How to tame the dragon, or leadership with imposter syndrome", Oleksandr Zin...
 
"Leadership, Soft Skills, and Personality Types for IT teams", Sergiy Tytenko
"Leadership, Soft Skills, and Personality Types for IT teams",  Sergiy Tytenko"Leadership, Soft Skills, and Personality Types for IT teams",  Sergiy Tytenko
"Leadership, Soft Skills, and Personality Types for IT teams", Sergiy Tytenko
 
"How Ukrainian IT specialist can go on vacation abroad without crossing the T...
"How Ukrainian IT specialist can go on vacation abroad without crossing the T..."How Ukrainian IT specialist can go on vacation abroad without crossing the T...
"How Ukrainian IT specialist can go on vacation abroad without crossing the T...
 
"Mastering cross-cultural communication", Anna Gandrabura
"Mastering cross-cultural communication", Anna Gandrabura"Mastering cross-cultural communication", Anna Gandrabura
"Mastering cross-cultural communication", Anna Gandrabura
 
"Incremental rollouts and rollbacks with business metrics control at every st...
"Incremental rollouts and rollbacks with business metrics control at every st..."Incremental rollouts and rollbacks with business metrics control at every st...
"Incremental rollouts and rollbacks with business metrics control at every st...
 
"AIRe - AI Reliability Engineering", Denys Vasyliev
"AIRe - AI Reliability Engineering", Denys Vasyliev"AIRe - AI Reliability Engineering", Denys Vasyliev
"AIRe - AI Reliability Engineering", Denys Vasyliev
 
"Testing of Helm Charts or There and Back Again", Yura Rochniak
"Testing of Helm Charts or There and Back Again", Yura Rochniak"Testing of Helm Charts or There and Back Again", Yura Rochniak
"Testing of Helm Charts or There and Back Again", Yura Rochniak
 
"How to mentor (future) devops engineers", Vsevolod Polyakov
"How to mentor (future) devops engineers",  Vsevolod Polyakov"How to mentor (future) devops engineers",  Vsevolod Polyakov
"How to mentor (future) devops engineers", Vsevolod Polyakov
 
"Crisis to Calm: Incident Management’s Role in Business Stability", Oleksii O...
"Crisis to Calm: Incident Management’s Role in Business Stability", Oleksii O..."Crisis to Calm: Incident Management’s Role in Business Stability", Oleksii O...
"Crisis to Calm: Incident Management’s Role in Business Stability", Oleksii O...
 
"DevOps Practisting Platform on EKS with Karpenter autoscaling", Dmytro Kozhevin
"DevOps Practisting Platform on EKS with Karpenter autoscaling", Dmytro Kozhevin"DevOps Practisting Platform on EKS with Karpenter autoscaling", Dmytro Kozhevin
"DevOps Practisting Platform on EKS with Karpenter autoscaling", Dmytro Kozhevin
 
"Platform Engineering with Development Containers", Igor Fesenko
"Platform Engineering with Development Containers", Igor Fesenko"Platform Engineering with Development Containers", Igor Fesenko
"Platform Engineering with Development Containers", Igor Fesenko
 

Recently uploaded

A Journey Into the Emotions of Software Developers
A Journey Into the Emotions of Software DevelopersA Journey Into the Emotions of Software Developers
A Journey Into the Emotions of Software DevelopersNicole Novielli
 
Moving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfMoving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfLoriGlavin3
 
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxUse of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxLoriGlavin3
 
Top 10 Hubspot Development Companies in 2024
Top 10 Hubspot Development Companies in 2024Top 10 Hubspot Development Companies in 2024
Top 10 Hubspot Development Companies in 2024TopCSSGallery
 
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24Mark Goldstein
 
A Framework for Development in the AI Age
A Framework for Development in the AI AgeA Framework for Development in the AI Age
A Framework for Development in the AI AgeCprime
 
Scale your database traffic with Read & Write split using MySQL Router
Scale your database traffic with Read & Write split using MySQL RouterScale your database traffic with Read & Write split using MySQL Router
Scale your database traffic with Read & Write split using MySQL RouterMydbops
 
A Glance At The Java Performance Toolbox
A Glance At The Java Performance ToolboxA Glance At The Java Performance Toolbox
A Glance At The Java Performance ToolboxAna-Maria Mihalceanu
 
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...Wes McKinney
 
Generative AI - Gitex v1Generative AI - Gitex v1.pptx
Generative AI - Gitex v1Generative AI - Gitex v1.pptxGenerative AI - Gitex v1Generative AI - Gitex v1.pptx
Generative AI - Gitex v1Generative AI - Gitex v1.pptxfnnc6jmgwh
 
2024 April Patch Tuesday
2024 April Patch Tuesday2024 April Patch Tuesday
2024 April Patch TuesdayIvanti
 
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxDigital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxLoriGlavin3
 
Data governance with Unity Catalog Presentation
Data governance with Unity Catalog PresentationData governance with Unity Catalog Presentation
Data governance with Unity Catalog PresentationKnoldus Inc.
 
Landscape Catalogue 2024 Australia-1.pdf
Landscape Catalogue 2024 Australia-1.pdfLandscape Catalogue 2024 Australia-1.pdf
Landscape Catalogue 2024 Australia-1.pdfAarwolf Industries LLC
 
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyes
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyesHow to Effectively Monitor SD-WAN and SASE Environments with ThousandEyes
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyesThousandEyes
 
Tampa BSides - The No BS SOC (slides from April 6, 2024 talk)
Tampa BSides - The No BS SOC (slides from April 6, 2024 talk)Tampa BSides - The No BS SOC (slides from April 6, 2024 talk)
Tampa BSides - The No BS SOC (slides from April 6, 2024 talk)Mark Simos
 
Varsha Sewlal- Cyber Attacks on Critical Critical Infrastructure
Varsha Sewlal- Cyber Attacks on Critical Critical InfrastructureVarsha Sewlal- Cyber Attacks on Critical Critical Infrastructure
Varsha Sewlal- Cyber Attacks on Critical Critical Infrastructureitnewsafrica
 
React JS; all concepts. Contains React Features, JSX, functional & Class comp...
React JS; all concepts. Contains React Features, JSX, functional & Class comp...React JS; all concepts. Contains React Features, JSX, functional & Class comp...
React JS; all concepts. Contains React Features, JSX, functional & Class comp...Karmanjay Verma
 
Potential of AI (Generative AI) in Business: Learnings and Insights
Potential of AI (Generative AI) in Business: Learnings and InsightsPotential of AI (Generative AI) in Business: Learnings and Insights
Potential of AI (Generative AI) in Business: Learnings and InsightsRavi Sanghani
 
[Webinar] SpiraTest - Setting New Standards in Quality Assurance
[Webinar] SpiraTest - Setting New Standards in Quality Assurance[Webinar] SpiraTest - Setting New Standards in Quality Assurance
[Webinar] SpiraTest - Setting New Standards in Quality AssuranceInflectra
 

Recently uploaded (20)

A Journey Into the Emotions of Software Developers
A Journey Into the Emotions of Software DevelopersA Journey Into the Emotions of Software Developers
A Journey Into the Emotions of Software Developers
 
Moving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfMoving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdf
 
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxUse of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
 
Top 10 Hubspot Development Companies in 2024
Top 10 Hubspot Development Companies in 2024Top 10 Hubspot Development Companies in 2024
Top 10 Hubspot Development Companies in 2024
 
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24
 
A Framework for Development in the AI Age
A Framework for Development in the AI AgeA Framework for Development in the AI Age
A Framework for Development in the AI Age
 
Scale your database traffic with Read & Write split using MySQL Router
Scale your database traffic with Read & Write split using MySQL RouterScale your database traffic with Read & Write split using MySQL Router
Scale your database traffic with Read & Write split using MySQL Router
 
A Glance At The Java Performance Toolbox
A Glance At The Java Performance ToolboxA Glance At The Java Performance Toolbox
A Glance At The Java Performance Toolbox
 
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...
 
Generative AI - Gitex v1Generative AI - Gitex v1.pptx
Generative AI - Gitex v1Generative AI - Gitex v1.pptxGenerative AI - Gitex v1Generative AI - Gitex v1.pptx
Generative AI - Gitex v1Generative AI - Gitex v1.pptx
 
2024 April Patch Tuesday
2024 April Patch Tuesday2024 April Patch Tuesday
2024 April Patch Tuesday
 
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxDigital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
 
Data governance with Unity Catalog Presentation
Data governance with Unity Catalog PresentationData governance with Unity Catalog Presentation
Data governance with Unity Catalog Presentation
 
Landscape Catalogue 2024 Australia-1.pdf
Landscape Catalogue 2024 Australia-1.pdfLandscape Catalogue 2024 Australia-1.pdf
Landscape Catalogue 2024 Australia-1.pdf
 
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyes
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyesHow to Effectively Monitor SD-WAN and SASE Environments with ThousandEyes
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyes
 
Tampa BSides - The No BS SOC (slides from April 6, 2024 talk)
Tampa BSides - The No BS SOC (slides from April 6, 2024 talk)Tampa BSides - The No BS SOC (slides from April 6, 2024 talk)
Tampa BSides - The No BS SOC (slides from April 6, 2024 talk)
 
Varsha Sewlal- Cyber Attacks on Critical Critical Infrastructure
Varsha Sewlal- Cyber Attacks on Critical Critical InfrastructureVarsha Sewlal- Cyber Attacks on Critical Critical Infrastructure
Varsha Sewlal- Cyber Attacks on Critical Critical Infrastructure
 
React JS; all concepts. Contains React Features, JSX, functional & Class comp...
React JS; all concepts. Contains React Features, JSX, functional & Class comp...React JS; all concepts. Contains React Features, JSX, functional & Class comp...
React JS; all concepts. Contains React Features, JSX, functional & Class comp...
 
Potential of AI (Generative AI) in Business: Learnings and Insights
Potential of AI (Generative AI) in Business: Learnings and InsightsPotential of AI (Generative AI) in Business: Learnings and Insights
Potential of AI (Generative AI) in Business: Learnings and Insights
 
[Webinar] SpiraTest - Setting New Standards in Quality Assurance
[Webinar] SpiraTest - Setting New Standards in Quality Assurance[Webinar] SpiraTest - Setting New Standards in Quality Assurance
[Webinar] SpiraTest - Setting New Standards in Quality Assurance
 

"Running Open-Source LLM models on Kubernetes", Volodymyr Tsap

  • 1. Running LLM in Kubernetes Volodymyr Tsap CTO @ SHALB
  • 2. What is LLM? A large language model (LLM) is a language model notable for its ability to achieve general-purpose language generation and understanding. LLMs acquire these abilities by learning statistical relationships from text documents during a computationally intensive self-supervised and semi-supervised training process. LLMs are artificial neural networks, the largest and most capable of which are built with a transformer-based architecture. Wikipedia
  • 3. What is Transformers? ! Transformers are a type of deep learning model that have revolutionized the way natural language processing tasks are approached. ! Transformers utilize a unique architecture that relies on self-attention mechanisms to weigh the significance of different words in a sentence. This allows the model to capture the context of each word more effectively than previous models, leading to better understanding and generation of text.
  • 4. Building LLM. Data Collection and Preparation. ! Collect a large and diverse dataset from various sources such as books, websites, and other texts. ! Clean and preprocess the data to remove irrelevant content, normalize text (e.g., lowercasing, removing special characters), and ensure data quality.
  • 5. Building LLM. Tokenization and Vocabulary Building. ! Tokenize the text data into smaller units (tokens) such as words, subwords, or characters. This step may involve choosing a specific tokenization algorithm (e.g., BPE, WordPiece). ! Create a vocabulary of unique tokens and possibly generate embeddings for them. This could involve pre-training embeddings or using embeddings from an existing model.
  • 6. Building LLM. Model Architecture Design. ! Choose a transformer architecture (e.g., GPT, BERT) that suits the goals of your LLM. This involves deciding on the number of layers, attention heads, and other hyperparameters. ! Implement or adapt an existing transformer model framework using deep learning libraries such as TensorFlow or PyTorch.
  • 7. Building LLM. Model Architecture Design.
  • 8. Building LLM. Training. ! Split the data into training, validation, and test sets. ! Pre-train the model on the collected data, which involves running it through the computation of weights over multiple epochs. This step is computationally intensive and can take from hours to weeks depending on the model size and hardware capabilities. ! Use techniques such as gradient clipping, learning rate scheduling, and regularization to improve training efficiency and model performance.
  • 9. Building LLM. Fine-Tuning (Optional). ! Fine-tune the pre-trained model on a smaller, task-specific dataset if the LLM will be used for specific applications (e.g., question answering, sentiment analysis). ! Adjust hyperparameters and training settings to optimize performance for the target task.
  • 10. Building LLM. Evaluation and Testing. ! Evaluate the model on a test set to measure its performance using appropriate metrics (e.g., accuracy, F1 score, perplexity). ! Perform error analysis and adjust the training process as necessary to improve model quality.
  • 11. Building LLM. Saving and Deployment. ! Save the trained model weights and configuration to files. ! Deploy the model for inference, which can involve setting up a serving infrastructure capable of handling requests in real-time or batch processing.
  • 12. TLDR. Watch Andrej Karpathy Explanation.
  • 13. Hugging Face - GitHub for LLM’s
  • 16. How to run? Using Google Colab with T4 gpu
  • 17. How to run? Using laptop and llama.cpp. Quantization.
  • 18. Using Managed Cloud Services. ! Amazon SageMaker ! Google Cloud AI Platform & Vertex AI ! Microsoft Azure Machine Learning ! NVIDIA AI Enterprise ! Hugging Face Endpoints ! AnyScale Endpoints
  • 19. Why to run them in Kubernetes? 1. We already know him :) 2. Scalability. Resource efficiency, HPA, auto-scaling, API Limits, etc.. 3. Price. Managed service 20-40% overhead. Reserved instances. 4. GPU sharing. 5. ML ecosystem - pipelines, artifacts. (KubeFlow, Ray Framework). 6. No vendor lock. Transportable.
  • 21. Options to run LLM on K8s. 1. KServe from Kubeflow. 2. Ray Serve from Ray Framework. 3. Flux AI controller. 4. Own Kubernetes wrapper on top of Frameworks.
  • 22. We choose TGI and made it Kubernetes ready.
  • 23. We have Docker, Lets adapt it to Kubernetes
  • 25. Let’s bootstrap infrastructure from cluster.dev template
  • 26. Then add model configuration
  • 27. Apply and check the model is running
  • 28. Changing models and infrastructure
  • 30. Deploy Monitoring and Metrics with DCGM Exporter
  • 31. Thank you! Now Questions?