In this workshop we covered an introduction to Generative AI and Large Language Models (LLMs), an explanation of AWS Foundation Models and their role in providing pre-trained LLMs, the benefits of leveraging LLMs in enterprises, deploying LLMs on AWS Infrastructure including infrastructure requirements and available AWS services and tools, and a demo showcasing Text-to-Image and Text Summarization using Foundation Models, as well as utilising Retrieval Augmented Generation and LangChain with AWS tools for Enterprise use cases.
Connect with me for interesting session in future
@https://www.linkedin.com/in/jayyanar/
1. Generative AI for Everyone
Hands on Workshop
Venue: Amazon Development Center, Aquila Bangalore
Date: 22 July and Time: 10.00 AM
2. 0
1
0
2
0
3
0
4
0
5
0
6
Stable Diffusion 101
AWS Foundation Models Overview
Advantage of using Sagemaker
LLM - Demo
Stable Diffusion Demo
CodeWhispherer - Demo
GENERATIVE AI - 101
FOUNDATION
MODELS
LAB / DEMO
AWS FM
Agenda
Path to Gen AI
Embedding 101
Quick View of Transformer Architecture
Terminology - Context, Temperature
Zero shot vs Few Shot
Chain-of-Thoughts
PROMPTING
LLM FOR
ENTERPRISE
Challenges in Adapting LLM in Enterprise
LangChain Overview
Reference Arch - GenAI Stack
DEMO / USECASE
LLM
Useful - Usecase and Pattern for Enterprise
Friendly Usecase - GenAI-Quiz
Enterprise usecase -Demo - RAG with Kendra
AWS UG
Bangalore Co-Organizer
Ayyanar Jeyakrishnan
3. Feedback
and
Questions
18 Yrs of
Experience -
Worked in EU
and NA
50+ & 10X AWS
Certifications
AWS Community
Builder - ML
Continuous
Learning is my
Passion
Worked as Lead
Engineer -
DevOps, MLOps
and ML and
Data Platform
Speaker in 20+
Public Events
on ML, AI,
GenAI, MLops
and DevOps
VP / Principal
Engineer in
Financial
Organization
Started as Hardware
Technician
4. Name :
Time :
Lab 3 - Use Cohere API for LLM - 10 Min
Lab 2 - ImageGen AWS FM - 10 Min
Lab 4 - Use DreamStudio API for Image
Generation- 10 Min
bit.ly/genai101
Workshop
AWS GenAI Labs
1 Hr 30 Minutes
Lab 1 - LLM AWS FM - 10 Min
Lab 5 - CodeWhisperer ***
bit.ly/awsaj
bit.ly/feedback-awscommunity
Feedback
Thank you!
Ayyanar Jeyakrishnan
Lab 6 - LLM with RAG using Kendra - 45 Min
Lab 0 - Embedding Word, Sentence
5. Path to Gen AI
Predictive Analytics
Descriptive
Analytics
Predictive AI
&
Descriptive AI
Word Embeddings
Word2Vec
RNN, LSTM, GRU
Linear Regression
Logistic Regression CNN, YOLO
Charts and
Graphs - Using BI Clustering
Prepared by Ayyanar Jeyakrishnan - AWS Community Machine Learning Builder
6. Generative AI
Path to Gen AI
Predictive Analytics
Descriptive
Analytics
Predictive AI
&
Descriptive AI
Word Embeddings
Word2Vec
RNN, LSTM, GRU
Transformer Models
Falcon, MPT, Dolly,
GPT-3.5, Bloom,
FLAN-T5, BART and
BERT, LLaMA
Stable Diffusion
Linear Regression
Logistic Regression CNN, YOLO
Charts and
Graphs - Using BI Clustering
GAN (Gen and Des)
Prepared by Ayyanar Jeyakrishnan - AWS Community Machine Learning Builder
7. Dog
Cat
Fruit
Banana
Let us Understand Basics
Tokenization Embedding
Semantic Search
Clustering
Dog
Cat
Fruit
Banana
3
5
18
19 Word2Vec
Dog
Cat
Banana
Fruit
Banana
Dog
Cat
KMeans
cosine_similarity
Prepared by Ayyanar Jeyakrishnan - AWS Community Machine Learning Builder
8. Let us Review via Code
corpus = [
"I love playing football",
"Football is my favorite sport",
"I enjoy watching football matches",
"Soccer is popular worldwide"
]
https://github.com/jayyanar/gen-ai-labs-demos/blob/main/lab0-embedding/embedding_demo.ipynb
Prepared by Ayyanar Jeyakrishnan - AWS Community Machine Learning Builder
9. Feed Forward Network - Sequential Models
It’s slow to train.
Long sequences lead to vanishing gradient
or the problem of long-term dependencies.
In simple terms, its memory is not that
strong when it comes to remembering old
connections.
RNN
Vanishing gradient
Slow training
Prepared by Ayyanar Jeyakrishnan - AWS Community Machine Learning Builder
10. Transformers - Attention Is All You Need - 2017
“Multi Headed Attention” enabled models to scale the
understanding of relationships between words and help to give
context.
Efficiently use Parallel computing.
Transfer Learning and Adaptability
Scalability
The Transformer model architecture Advantage.
Prepared by Ayyanar Jeyakrishnan - AWS Community Machine Learning Builder
11. Transformer - Encoder - Intuitive Understanding
E.g - 1 ) I liked the cricket bat purchased from Amazon.in
E.g - 2 ) I appreciate the prompt delivery by amazon and product
description was good, But the product not lived upto my expectation
E.g 3) - Magnificent six strike into the crowd
Provide Position + Word2Vec Embedding
SoftMax
A feed-forward NN is the next step. We can apply a simple feed-forward
neural network to each attention vector to transform attention vectors into a
form that the next encoder or decoder layer will accept it easily.
Multi Head Attention
FeedForward
Imagine multi-head attention as a multi-tasker. You have a word with you,
from which the Transformer deep learning model needs to learn the next
word. Multi-head attention performs different parallel computations for the
same word to achieve different results. These results are then connected
to SoftMax to output the best suitable word.
Decoder Network
Prepared by Ayyanar Jeyakrishnan - AWS Community Machine Learning Builder
13. Rise of the machines
Subjective
Performance
Parameter Count (Model Size)
G
P
T
-
2
1
.
5
b
i
l
l
i
o
n
G
P
T
-
3
1
7
5
b
i
l
l
i
o
n
B
L
O
O
M
1
7
6
b
i
l
l
i
o
n
B
E
R
T
3
4
5
m
i
l
l
i
o
n
?
* Not to scale, for illustrative purposes only
20. Stable Diffusion
Diffusion models.
Adds noise and learns how to work backwards to the original image.
Trained model works from random noise to generated an image.
25. Decoder Only Model Encoder Decoder Model
Quick Quiz
Encoder Only Model
Prepared by Ayyanar Jeyakrishnan - AWS Community Machine Learning Builder
26. E.g) BERT
sentiment analysis,
question-answering,
and named entity
recognition.
E.g) GPT
text summarization,
question answering,
classification
E.g) AlexaTM, T5
translation and
summarization
Quick Quiz
Encoder Only Decoder Only Encoder - Decoder
Prepared by Ayyanar Jeyakrishnan - AWS Community Machine Learning Builder
27. Connect with Large Language Model Easily
- Prompting -
Context Window
Maximum Output Token
1
2
3
4
A higher temperature value (e.g., 1.5) leads to more diverse and creative text,
while a lower value (e.g., 0.5) results in more focused and deterministic text.
If the generated text is too narrow in scope and lacks diversity, consider
increasing the probability threshold (p).
If the generated text is too diverse and includes irrelevant words, consider
decreasing the probability threshold (p).
Sagemaker Foundation Model - Playground Console
Prepared by Ayyanar Jeyakrishnan - AWS Community Machine Learning Builder
28. LLM - Prompting
Prompting
Text Generation
1
Summarization
2
Translation
3
Code Generation
4
Question and
Answering
5
YOU CHOICE
OF LLM
Write a Blog about Indian Cinema
Summarize the Article in 2 Para "input"
Translate <input> to French
Convert <input> to Java
Create a Question and Answer <input sample>
1.
2.
3.
4.
5.
Quality
Cost
Latency
Customize
Major Cons
Limited to Context Window
Prepared by Ayyanar Jeyakrishnan - AWS Community Machine Learning Builder
29. Zero Shot - Task described, but demonstrations not given
Few-shot: Task described and random demonstrations provided
LLM - Prompting
Prepared by Ayyanar Jeyakrishnan - AWS Community Machine Learning Builder
30. Chain of Thoughts Prompting
Prepared by Ayyanar Jeyakrishnan - AWS Community Machine Learning Builder
31. Connect with Large Language via API
- Persona - Developers - Prompting
Sagemaker Foundation Models - Connect via API
Prepared by Ayyanar Jeyakrishnan - AWS Community Machine Learning Builder
32. LLM Using AWS Foundation Model
Use StableFusion AWS FM
Use Cohere API for Demo
Use DreamStudio API for Demo
Codewhisperer
Prompting
Prepared by Ayyanar Jeyakrishnan - AWS Community Machine Learning Builder
35. AWS Comprehend
for Name Entity
Recognition -
November 2018
Adaption Journey - AWS Foundation Model
Integration with
Sagemaker Pipeline for
Batch Processing
Increased in
Usecase
Spike in Cost -
Demand Refactoring
Deployed AWS FM - BERT in
Sagemaker Endpoint and Integrated
with Pipeline - Sep 2022
50% - Reduction
Prepared by Ayyanar Jeyakrishnan - AWS Community Machine Learning Builder
36. Corpus from Internet
FM - Large Language Model - Adoption Journey
Foundation Model of Choice
E.g) Stability AI, Anthropic,
AI21, Cohere, LLaMa-2
Consume
Sagemaker
Endpoint as API
from Application
FrondEnd.
e.g) Amplify
Missing Enterprise Data Context or Live Data
Hallucination
Not able to build a LLM Application for Endusers
Major Cons
Prepared by Ayyanar Jeyakrishnan - AWS Community Machine Learning Builder
37. Challenges in Adapting Enterprise
Pretraining
Model from
Scratch
Data privacy and
security and
Ethics
Model
interpretability
and
Hallucination
Fine-tuning and
customization
Continual model
improvement
and Integration
1 2 3 4 5
Prepared by Ayyanar Jeyakrishnan - AWS Community Machine Learning Builder
38. Sagemaker Inference - AWS Inferenia, , G5 NVidia - Deployed with in VPC
Corpus from Internet
- Domain Specific Models -
Pre-train Model from scratch
Post Training Apply RLHF
e.g) BloomGPT
Enterprise Data
FM - Large Language Model - Adoption Journey
Compute for Training -
Trainium, G5, Hibana Instances, Parallel Cluster, Distributed Training, NeuronSDK
Foundation Model of Choice
E.g) Stability AI, Anthropic,
AI21, Cohere, LLaMa-2
Prepared by Ayyanar Jeyakrishnan - AWS Community Machine Learning Builder
39. 64* p4d.24xlarge instances. - 50K Per Day
Each p4d.24xlarge instance has 8 NVIDIA
40GB A100 GPUs with NVIDIA NVSwitch intra-
node connections(600GB/s)
NVIDIA GPU Direct using AWS Elastic Fabric
Adapter (EFA) inter-node
connections(400Gb/s)
AmazonFSX for Lustre, which supports upto
1000MB/s read and write throughput per TiB
storageunit.
SageMaker ModelParallelism(SMP) library from
AWS, which enables the automatic
distribution of large models across multiple
GPU devices and instances
363 Billion Tokens -
Bloomberg Dataset
(FINEPILE)
Web - 298B
News - 38B
Filings - 14B
Press - 9B
Bloomberg - 9B
345 Billion Tokens -
General Purpose
Dataset
The Pile - 184
C4 - 138B
Wikipedia - 24B
LLM - PreTrain - Sagemaker - Example
Ref :https://arxiv.org/pdf/2303.17564.pdf
Compute Cost
Need Quality Dataset
Time to Train
Major Cons
Prepared by Ayyanar Jeyakrishnan - AWS Community Machine Learning Builder
40. Sagemaker Inference - AWS Inferenia, , G5 NVidia - Deployed with in VPC
Corpus from Internet
- Domain Specific Models -
Pre-train Model from scratch
Post Training Apply RLHF
e.g) BloomGPT
Enterprise Data
FM - Large Language Model - Adoption Journey
Compute for Training -
Trainium, G5, Hibana Instances, Parallel Cluster, Distributed Training, NeuronSDK
Foundation Model of Choice
E.g) Stability AI, Anthropic,
AI21, Cohere, LLaMa-2
Finetune the FM Model
-JSON Format, Question, Context. -
E.g) FLAN-T5 -XXL
PEFT LoRA / Q-LoRA
Prepared by Ayyanar Jeyakrishnan - AWS Community Machine Learning Builder
41. LLM - Full Fine Tuning
Pretrained LLM
You Dataset
Eg - BERT, GPT,
LLaMA-2
Finetuned LLM
Eg - Legal, Medicine
Still Missing Enterprise Data
Need Quality Dataset
Time to Train and Cost
Major Cons
Prepared by Ayyanar Jeyakrishnan - AWS Community Machine Learning Builder
42. Pretrained LLM
You Dataset
Eg - BERT, GPT,
LLaMA-2
Not LLaMA-1
Original Weight
Frozen
Finetuned LLM
using Sagemaker
Add fine tune
with new
Parameter
Only Small Set of
finetuned - new
parameters
Low-Rank Adaptation of LLM (LoRA)
Quantized LLMs with Low-Rank Adapters (QLoRA)
LLM - Fine Tuning - PEFT
Still Missing Enterprise Data
Need Quality Dataset
Major Cons
Prepared by Ayyanar Jeyakrishnan - AWS Community Machine Learning Builder
43. FM - Large Language Model - FineTune - LoRA Sagemaker
Prepared by Ayyanar Jeyakrishnan - AWS Community Machine Learning Builder
44. Sagemaker Inference - AWS Inferenia, , G5 NVidia - Deployed with in VPC
Corpus from Internet
Finetune the FM Model
-JSON Format, Question, Context. -
E.g) FLAN-T5 -XXL
PEFT LoRA / Q-LoRA
- Domain Specific Models -
Pre-train Model from scratch
Post Training Apply RLHF
e.g) BloomGPT
Enterprise Data
Consume
Sagemaker
Endpoint as API
from Application
FrondEnd.
e.g) Amplify,
Lex
LangChain - Indexing and Chain
LlamaIndex
Applications -
1) Retrieval-augmented-Generation.
2) Chatbot
VectorDB - AWS Opensearch for Embedding
FM - Large Language Model - Adoption Journey
Compute for Training -
Trainium, G5, Hibana Instances, Parallel Cluster, Distributed Training, NeuronSDK
Foundation Model of Choice
E.g) Stability AI, Anthropic,
AI21, Cohere, LLaMa-2
Prepared by Ayyanar Jeyakrishnan - AWS Community Machine Learning Builder
45. LLM - Langchain
LLMs and
Prompts
Chains
Data
Augmented
Generation
Agents
Memory
This includes prompt management, prompt optimization, a generic
interface for all LLMs, and common utilities for working with LLMs.
Chains go beyond a single LLM call and involve sequences of calls
(LLM + utility). LangChain provides a standard interface for chains,
lots of integrations with other tools, and end-to-end chains for
common applications.
It involve an LLM making decisions about which Actions to take,
taking that Action, seeing an Observation, and repeating that until
done. It provides a standard interface for agents, a selection of
agents to choose from, and examples of end-to-end agents
It involves specific types of chains that first interact with an
external data source to fetch data for use in the generation step.
E.g) Summarization and QA
Memory refers to persisting state between calls of a chain/agent.
LangChain provides a standard interface for memory, a collection
of memory implementations, and examples of chains/agents that
use memory.
Prepared by Ayyanar Jeyakrishnan - AWS Community Machine Learning Builder
46. LLM - RAG Introduction
Foundation models are trained offline and lack adaptability to new data. Retrieval Augmented
Generation (RAG) addresses this by retrieving external data from various sources and
incorporating it into prompts.
RAG models convert documents and queries into numerical representations, compare query
embeddings with a knowledge library, append relevant context to the prompt, and send it to the
foundation model
Knowledge libraries can be updated asynchronously, allowing for continuous improvement.
https://aws.amazon.com/blogs/machine-learning/question-answering-using-retrieval-augmented-generation-with-
foundation-models-in-amazon-sagemaker-jumpstart/
47. LLM in Enterprise using AWS use-case - 1
Providing Enterprise Confluence Documentation via CustomUI from Streamlit
Ref: https://aws.amazon.com/blogs/machine-learning/question-answering-using-retrieval-augmented-generation-
with-foundation-models-in-amazon-sagemaker-jumpstart/
Need to Integrated to Chat Tool
Major Cons
48. LLM in Enterprise using AWS use-case - 2
Providing Conversation AI using Lex and LLM
https://aws.amazon.com/blogs/machine-learning/enhance-amazon-lex-with-conversational-faq-features-using-llms/
I need to connect to Databases
Major Cons
49. Sagemaker Inference - AWS Inferenia, , G5 NVidia - Deployed with in VPC
Corpus from Internet
Finetune the FM Model
-JSON Format, Question, Context. -
E.g) FLAN-T5 -XXL
PEFT LoRA / Q-LoRA
- Domain Specific Models -
Pre-train Model from scratch
Post Training Apply RLHF
e.g) BloomGPT
Enterprise Data
Consume
Sagemaker
Endpoint as API
from Application
FrondEnd.
e.g) Amplify, Lex
LangChain - Indexing and Chain
Retrieval-augmented-Generation
Chatbot
Chat with Enterprise Structure Databases
VectorDB - AWS Opensearch for Embedding
Database - S3/Athena, Aurora/Redshift/Snowflake
Compute for Training -
Trainium, G5, Hibana Instances, Parallel Cluster, Distributed Training, NeuronSDK
Foundation Model of Choice
E.g) Stability AI, Anthropic,
AI21, Cohere, LLaMa-2
FM - Large Language Model - Adoption Journey
Prepared by Ayyanar Jeyakrishnan - AWS Community Machine Learning Builder
50. LLM in Enterprise using AWS use-case - 3
Connect to Enterprise Data
https://aws.amazon.com/blogs/machine-learning/reinventing-the-data-experience-use-generative-ai-and-modern-data-architecture-to-unlock-insights/
51. LLM in Enterprise using AWS use-case - 4
Connect to Enterprise Search Data
https://aws.amazon.com/blogs/machine-learning/quickly-build-high-accuracy-generative-ai-applications-on-enterprise-data-using-amazon-kendra-langchain-and-large-language-models/
52. Advantages of using AWS
Flexibility
Secure
Customization
Cost Effective
Infrastructure Integration
Gen AI Powered
Solutions
AWS Provides a
Option to building
your own DL
Container From
Scratch Pytorch 2.0
You can fine tune the
Model by keeping the
Data in S3, KMS
Encryption, Private
Link connection
ensure that your data
with in Customer
Infrastructure.
Responsible AI.
For LLM and
Diffusion Model
AWS Trainium - Cost
Efficient, High
Performance training.
AWS Inferencia1,2 -
High Performance at
the lowest cost per
inference.
AWS has over 200
fully featured services
. Sagemaker Provides
robust feature
MLOps Capabilities
to productionalize
your GenAI Model
Bedrock API
underling with FM.
Wide Variety of
Foundation Models.
You can use AWS
DeepLearning AMI or
Deeplearning
Container
AWS Guarantee that
data fine tuned by
Customer will not be
used by AWS.
Including the trained
Model will be stored
in Customer S3.
AWS Provide large
variety of Compute
Instance - Graviton,
NVIDIA GPU , Habana
Gaudi from Intel.
Integration with
Market Place, Partner
services, S3,
Elasticsearch as
Vector Database,
Kendra for RAG
Amazon Build
Foundation Model -
Amazon Titan (Text
and Embeddings) part
of BedRock Offerings
Prepared by Ayyanar Jeyakrishnan - AWS Community Machine Learning Builder
53. Sagemaker Inference - AWS Inferenia, , G5 NVidia - Deployed with in VPC
Corpus from Internet
Finetune the FM Model
-JSON Format, Question, Context. -
E.g) FLAN-T5 -XXL
PEFT LoRA / Q-LoRA
- Domain Specific Models -
Pre-train Model from scratch
Post Training Apply RLHF
e.g) BloomGPT
Enterprise Data
Consume
Sagemaker
Endpoint as API
from Application
FrondEnd.
e.g) Amplify, Lex
LangChain - Indexing and Chain
LLamaIndex
Data Governance
Validation,
Tracking, Logging
-
LangSmith
Retrieval-augmented-Generation
Chatbot, Chat with Structure Data,
Connect to PDF from LangChain Library
VectorDB - AWS Opensearch for Embedding
Database - S3/Athena, Aurora/Redshift/Snowflake
FM - Large Language Model - Ref Stack
Compute for Training -
Trainium, G5, Hibana Instances, Parallel Cluster, Distributed Training, NeuronSDK
Foundation Model of Choice
E.g) Stability AI, Anthropic,
AI21, Cohere, LLaMa-2
LLMOps
Sagemaker
Pipeline
Usecases
Prepared by Ayyanar Jeyakrishnan - AWS Community Machine Learning Builder
54. Models will keep getting bigger, and more sophisticated.
Many developers will use foundation models and prompt engineering.
Open Models and Domain Specific Model get more traction.
Sustainability come into focus.
Where to from here?
Algorithm optimization. - e.g) Small Model do Better, Domain Specific Models
Dedicated hardware, designed for LLM, such as AWS Trainium reduce costs.
Model pruning and other optimization. e.g) Student-Teacher Model, Model-Distillation
Dedicated hardware, designed for LLM, such as AWS Inferentia reduce costs.
Prepared by Ayyanar Jeyakrishnan - AWS Community Machine Learning Builder
55. Sample Usecase - Fun Project
https://dev.to/jayyanar/triviagenai-m98
56. Enterprise usecase - 4
Retrieval-Augmented Generation
(RAG) Pattern, Amazon Kendra
Enterprise Search Service and
Falcon-40B-Instruct Language
Model
Prepared by Ayyanar Jeyakrishnan - AWS Community Machine Learning Builder
57. Name :
Time :
Lab 3 - Use Cohere API for LLM - 10 Min
Lab 2 - ImageGen AWS FM - 10 Min
Lab 4 - Use DreamStudio API for Image
Generation- 10 Min
bit.ly/genai101
Workshop
AWS GenAI Labs
1 Hr 30 Minutes
Lab 1 - LLM AWS FM - 10 Min
Lab 5 - CodeWhisperer ***
bit.ly/awsaj
bit.ly/feedback-awscommunity
Feedback
Thank you!
Ayyanar Jeyakrishnan
Lab 6 - LLM with RAG using Kendra - 45 Min
Lab 0 - Embedding Word, Sentence