AWS_Meetup_BLR_July_22_Social.pdf

Generative AI for Everyone
Hands on Workshop
Venue: Amazon Development Center, Aquila Bangalore
Date: 22 July and Time: 10.00 AM

0
1
0
2
0
3
0
4
0
5
0
6
Stable Diffusion 101
AWS Foundation Models Overview
Advantage of using Sagemaker
LLM - Demo
Stable Diffusion Demo
CodeWhispherer - Demo
GENERATIVE AI - 101
FOUNDATION
MODELS
LAB / DEMO
AWS FM
Agenda
Path to Gen AI
Embedding 101
Quick View of Transformer Architecture
Terminology - Context, Temperature
Zero shot vs Few Shot
Chain-of-Thoughts
PROMPTING
LLM FOR
ENTERPRISE
Challenges in Adapting LLM in Enterprise
LangChain Overview
Reference Arch - GenAI Stack
DEMO / USECASE
LLM
Useful - Usecase and Pattern for Enterprise
Friendly Usecase - GenAI-Quiz
Enterprise usecase -Demo - RAG with Kendra
AWS UG
Bangalore Co-Organizer
Ayyanar Jeyakrishnan

Feedback
and
Questions
18 Yrs of
Experience -
Worked in EU
and NA
50+ & 10X AWS
Certifications
AWS Community
Builder - ML
Continuous
Learning is my
Passion
Worked as Lead
Engineer -
DevOps, MLOps
and ML and
Data Platform
Speaker in 20+
Public Events
on ML, AI,
GenAI, MLops
and DevOps
VP / Principal
Engineer in
Financial
Organization
Started as Hardware
Technician

Name :
Time :
Lab 3 - Use Cohere API for LLM - 10 Min
Lab 2 - ImageGen AWS FM - 10 Min
Lab 4 - Use DreamStudio API for Image
Generation- 10 Min
bit.ly/genai101
Workshop
AWS GenAI Labs
1 Hr 30 Minutes
Lab 1 - LLM AWS FM - 10 Min
Lab 5 - CodeWhisperer ***
bit.ly/awsaj
bit.ly/feedback-awscommunity
Feedback
Thank you!
Ayyanar Jeyakrishnan
Lab 6 - LLM with RAG using Kendra - 45 Min
Lab 0 - Embedding Word, Sentence

Path to Gen AI
Predictive Analytics
Descriptive
Analytics
Predictive AI
&
Descriptive AI
Word Embeddings
Word2Vec
RNN, LSTM, GRU
Linear Regression
Logistic Regression CNN, YOLO
Charts and
Graphs - Using BI Clustering
Prepared by Ayyanar Jeyakrishnan - AWS Community Machine Learning Builder

Generative AI
Path to Gen AI
Predictive Analytics
Descriptive
Analytics
Predictive AI
&
Descriptive AI
Word Embeddings
Word2Vec
RNN, LSTM, GRU
Transformer Models
Falcon, MPT, Dolly,
GPT-3.5, Bloom,
FLAN-T5, BART and
BERT, LLaMA
Stable Diffusion
Linear Regression
Logistic Regression CNN, YOLO
Charts and
Graphs - Using BI Clustering
GAN (Gen and Des)

Dog
Cat
Fruit
Banana
Let us Understand Basics
Tokenization Embedding
Semantic Search
Clustering
Dog
Cat
Fruit
Banana
3
5
18
19 Word2Vec
Dog
Cat
Banana
Fruit
Banana
Dog
Cat
KMeans
cosine_similarity

Let us Review via Code
corpus = [
"I love playing football",
"Football is my favorite sport",
"I enjoy watching football matches",
"Soccer is popular worldwide"
]
https://github.com/jayyanar/gen-ai-labs-demos/blob/main/lab0-embedding/embedding_demo.ipynb

Feed Forward Network - Sequential Models
It’s slow to train.
Long sequences lead to vanishing gradient
or the problem of long-term dependencies.
In simple terms, its memory is not that
strong when it comes to remembering old
connections.
RNN
Vanishing gradient
Slow training

Transformers - Attention Is All You Need - 2017
“Multi Headed Attention” enabled models to scale the
understanding of relationships between words and help to give
context.
Efficiently use Parallel computing.
Transfer Learning and Adaptability
Scalability
The Transformer model architecture Advantage.

Transformer - Encoder - Intuitive Understanding
E.g - 1 ) I liked the cricket bat purchased from Amazon.in
E.g - 2 ) I appreciate the prompt delivery by amazon and product
description was good, But the product not lived upto my expectation
E.g 3) - Magnificent six strike into the crowd
Provide Position + Word2Vec Embedding
SoftMax
A feed-forward NN is the next step. We can apply a simple feed-forward
neural network to each attention vector to transform attention vectors into a
form that the next encoder or decoder layer will accept it easily.
Multi Head Attention
FeedForward
Imagine multi-head attention as a multi-tasker. You have a word with you,
from which the Transformer deep learning model needs to learn the next
word. Multi-head attention performs different parallel computations for the
same word to achieve different results. These results are then connected
to SoftMax to output the best suitable word.
Decoder Network

https://ai.googleblog.com/2017/08/transformer-novel-neural-network.html
Transformer - Intuitive Understanding

Rise of the machines
Subjective
Performance
Parameter Count (Model Size)
G
P
T
-
2
1
.
5
b
i
l
l
i
o
n
G
P
T
-
3
1
7
5
b
i
l
l
i
o
n
B
L
O
O
M
1
7
6
b
i
l
l
i
o
n
B
E
R
T
3
4
5
m
i
l
l
i
o
n
?
* Not to scale, for illustrative purposes only

Stable Diffusion Examples
Diffusion models.
“Marigold the puppy”
Stable Diffusion
Training

Diffusion models.
the puppy”
Stable Diffusion
Training
“a photo of Marigold the puppy
as street art”

Diffusion models.
“a photo of Marigold the puppy
as 8bit art”
the puppy”
Stable Diffusion
Training

Stable Diffusion
Diffusion models.

Stable Diffusion
Diffusion models.
Adds noise and learns how to work backwards to the original image.

Stable Diffusion
Diffusion models.
Adds noise and learns how to work backwards to the original image.
Trained model works from random noise to generated an image.

Diffusion models.
“Marigold the puppy on a space
station”
the puppy”
Stable Diffusion
Training

Amazon Bedrock – Foundation Models

Decoder Only Model Encoder Decoder Model
Quick Quiz
Encoder Only Model

E.g) BERT
sentiment analysis,
question-answering,
and named entity
recognition.
E.g) GPT
text summarization,
question answering,
classification
E.g) AlexaTM, T5
translation and
summarization
Quick Quiz
Encoder Only Decoder Only Encoder - Decoder

Connect with Large Language Model Easily
- Prompting -
Context Window
Maximum Output Token
1
2
3
4
A higher temperature value (e.g., 1.5) leads to more diverse and creative text,
while a lower value (e.g., 0.5) results in more focused and deterministic text.
If the generated text is too narrow in scope and lacks diversity, consider
increasing the probability threshold (p).
If the generated text is too diverse and includes irrelevant words, consider
decreasing the probability threshold (p).
Sagemaker Foundation Model - Playground Console

LLM - Prompting
Prompting
Text Generation
1
Summarization
2
Translation
3
Code Generation
4
Question and
Answering
5
YOU CHOICE
OF LLM
Write a Blog about Indian Cinema
Summarize the Article in 2 Para "input"
Translate <input> to French
Convert <input> to Java
Create a Question and Answer <input sample>
1.
2.
3.
4.
5.
Quality
Cost
Latency
Customize
Major Cons
Limited to Context Window

Zero Shot - Task described, but demonstrations not given
Few-shot: Task described and random demonstrations provided
LLM - Prompting

Chain of Thoughts Prompting

Connect with Large Language via API
- Persona - Developers - Prompting
Sagemaker Foundation Models - Connect via API

LLM Using AWS Foundation Model
Use StableFusion AWS FM
Use Cohere API for Demo
Use DreamStudio API for Demo
Codewhisperer
Prompting

WELCOME
QUESTIONS
Session Continue .... Post Break

AWS Comprehend
for Name Entity
Recognition -
November 2018
Adaption Journey - AWS Foundation Model
Integration with
Sagemaker Pipeline for
Batch Processing
Increased in
Usecase
Spike in Cost -
Demand Refactoring
Deployed AWS FM - BERT in
Sagemaker Endpoint and Integrated
with Pipeline - Sep 2022
50% - Reduction

Corpus from Internet
FM - Large Language Model - Adoption Journey
Foundation Model of Choice
E.g) Stability AI, Anthropic,
AI21, Cohere, LLaMa-2
Consume
Sagemaker
Endpoint as API
from Application
FrondEnd.
e.g) Amplify
Missing Enterprise Data Context or Live Data
Hallucination
Not able to build a LLM Application for Endusers
Major Cons

Challenges in Adapting Enterprise
Pretraining
Model from
Scratch
Data privacy and
security and
Ethics
Model
interpretability
and
Hallucination
Fine-tuning and
customization
Continual model
improvement
and Integration
1 2 3 4 5

Sagemaker Inference - AWS Inferenia, , G5 NVidia - Deployed with in VPC
- Domain Specific Models -
Pre-train Model from scratch
Post Training Apply RLHF
e.g) BloomGPT
Enterprise Data
Compute for Training -
Trainium, G5, Hibana Instances, Parallel Cluster, Distributed Training, NeuronSDK

64* p4d.24xlarge instances. - 50K Per Day
Each p4d.24xlarge instance has 8 NVIDIA
40GB A100 GPUs with NVIDIA NVSwitch intra-
node connections(600GB/s)
NVIDIA GPU Direct using AWS Elastic Fabric
Adapter (EFA) inter-node
connections(400Gb/s)
AmazonFSX for Lustre, which supports upto
1000MB/s read and write throughput per TiB
storageunit.
SageMaker ModelParallelism(SMP) library from
AWS, which enables the automatic
distribution of large models across multiple
GPU devices and instances
363 Billion Tokens -
Bloomberg Dataset
(FINEPILE)
Web - 298B
News - 38B
Filings - 14B
Press - 9B
Bloomberg - 9B
345 Billion Tokens -
General Purpose
Dataset
The Pile - 184
C4 - 138B
Wikipedia - 24B
LLM - PreTrain - Sagemaker - Example
Ref :https://arxiv.org/pdf/2303.17564.pdf
Compute Cost
Need Quality Dataset
Time to Train
Major Cons

e.g) BloomGPT
Enterprise Data
Finetune the FM Model
-JSON Format, Question, Context. -
E.g) FLAN-T5 -XXL
PEFT LoRA / Q-LoRA

LLM - Full Fine Tuning
Pretrained LLM
You Dataset
Eg - BERT, GPT,
LLaMA-2
Finetuned LLM
Eg - Legal, Medicine
Still Missing Enterprise Data
Time to Train and Cost
Major Cons

Pretrained LLM
You Dataset
Eg - BERT, GPT,
LLaMA-2
Not LLaMA-1
Original Weight
Frozen
Finetuned LLM
using Sagemaker
Add fine tune
with new
Parameter
Only Small Set of
finetuned - new
parameters
Low-Rank Adaptation of LLM (LoRA)
Quantized LLMs with Low-Rank Adapters (QLoRA)
LLM - Fine Tuning - PEFT
Still Missing Enterprise Data
Major Cons

FM - Large Language Model - FineTune - LoRA Sagemaker

E.g) FLAN-T5 -XXL
PEFT LoRA / Q-LoRA
e.g) BloomGPT
Enterprise Data
Consume
Sagemaker
Endpoint as API
from Application
FrondEnd.
e.g) Amplify,
Lex
LangChain - Indexing and Chain
LlamaIndex
Applications -
1) Retrieval-augmented-Generation.
2) Chatbot
VectorDB - AWS Opensearch for Embedding

LLM - Langchain
LLMs and
Prompts
Chains
Data
Augmented
Generation
Agents
Memory
This includes prompt management, prompt optimization, a generic
interface for all LLMs, and common utilities for working with LLMs.
Chains go beyond a single LLM call and involve sequences of calls
(LLM + utility). LangChain provides a standard interface for chains,
lots of integrations with other tools, and end-to-end chains for
common applications.
It involve an LLM making decisions about which Actions to take,
taking that Action, seeing an Observation, and repeating that until
done. It provides a standard interface for agents, a selection of
agents to choose from, and examples of end-to-end agents
It involves specific types of chains that first interact with an
external data source to fetch data for use in the generation step.
E.g) Summarization and QA
Memory refers to persisting state between calls of a chain/agent.
LangChain provides a standard interface for memory, a collection
of memory implementations, and examples of chains/agents that
use memory.

LLM - RAG Introduction
Foundation models are trained offline and lack adaptability to new data. Retrieval Augmented
Generation (RAG) addresses this by retrieving external data from various sources and
incorporating it into prompts.
RAG models convert documents and queries into numerical representations, compare query
embeddings with a knowledge library, append relevant context to the prompt, and send it to the
foundation model
Knowledge libraries can be updated asynchronously, allowing for continuous improvement.
https://aws.amazon.com/blogs/machine-learning/question-answering-using-retrieval-augmented-generation-with-
foundation-models-in-amazon-sagemaker-jumpstart/

LLM in Enterprise using AWS use-case - 1
Providing Enterprise Confluence Documentation via CustomUI from Streamlit
Ref: https://aws.amazon.com/blogs/machine-learning/question-answering-using-retrieval-augmented-generation-
with-foundation-models-in-amazon-sagemaker-jumpstart/
Need to Integrated to Chat Tool
Major Cons

Providing Conversation AI using Lex and LLM
https://aws.amazon.com/blogs/machine-learning/enhance-amazon-lex-with-conversational-faq-features-using-llms/
I need to connect to Databases
Major Cons

E.g) FLAN-T5 -XXL
PEFT LoRA / Q-LoRA
e.g) BloomGPT
Enterprise Data
Consume
Sagemaker
Endpoint as API
from Application
FrondEnd.
e.g) Amplify, Lex
Retrieval-augmented-Generation
Chatbot
Chat with Enterprise Structure Databases
Database - S3/Athena, Aurora/Redshift/Snowflake

Connect to Enterprise Data
https://aws.amazon.com/blogs/machine-learning/reinventing-the-data-experience-use-generative-ai-and-modern-data-architecture-to-unlock-insights/

Connect to Enterprise Search Data
https://aws.amazon.com/blogs/machine-learning/quickly-build-high-accuracy-generative-ai-applications-on-enterprise-data-using-amazon-kendra-langchain-and-large-language-models/

Advantages of using AWS
Flexibility
Secure
Customization
Cost Effective
Infrastructure Integration
Gen AI Powered
Solutions
AWS Provides a
Option to building
your own DL
Container From
Scratch Pytorch 2.0
You can fine tune the
Model by keeping the
Data in S3, KMS
Encryption, Private
Link connection
ensure that your data
with in Customer
Infrastructure.
Responsible AI.
For LLM and
Diffusion Model
AWS Trainium - Cost
Efficient, High
Performance training.
AWS Inferencia1,2 -
High Performance at
the lowest cost per
inference.
AWS has over 200
fully featured services
. Sagemaker Provides
robust feature
MLOps Capabilities
to productionalize
your GenAI Model
Bedrock API
underling with FM.
Wide Variety of
Foundation Models.
You can use AWS
DeepLearning AMI or
Deeplearning
Container
AWS Guarantee that
data fine tuned by
Customer will not be
used by AWS.
Including the trained
Model will be stored
in Customer S3.
AWS Provide large
variety of Compute
Instance - Graviton,
NVIDIA GPU , Habana
Gaudi from Intel.
Integration with
Market Place, Partner
services, S3,
Elasticsearch as
Vector Database,
Kendra for RAG
Amazon Build
Foundation Model -
Amazon Titan (Text
and Embeddings) part
of BedRock Offerings

E.g) FLAN-T5 -XXL
PEFT LoRA / Q-LoRA
e.g) BloomGPT
Enterprise Data
Consume
Sagemaker
Endpoint as API
from Application
FrondEnd.
e.g) Amplify, Lex
LLamaIndex
Data Governance
Validation,
Tracking, Logging
-
LangSmith
Retrieval-augmented-Generation
Chatbot, Chat with Structure Data,
Connect to PDF from LangChain Library
Database - S3/Athena, Aurora/Redshift/Snowflake
FM - Large Language Model - Ref Stack
LLMOps
Sagemaker
Pipeline
Usecases

Models will keep getting bigger, and more sophisticated.
Many developers will use foundation models and prompt engineering.
Open Models and Domain Specific Model get more traction.
Sustainability come into focus.
Where to from here?
Algorithm optimization. - e.g) Small Model do Better, Domain Specific Models
Dedicated hardware, designed for LLM, such as AWS Trainium reduce costs.
Model pruning and other optimization. e.g) Student-Teacher Model, Model-Distillation
Dedicated hardware, designed for LLM, such as AWS Inferentia reduce costs.

Sample Usecase - Fun Project
https://dev.to/jayyanar/triviagenai-m98

Enterprise usecase - 4
Retrieval-Augmented Generation
(RAG) Pattern, Amazon Kendra
Enterprise Search Service and
Falcon-40B-Instruct Language
Model

AWS_Meetup_BLR_July_22_Social.pdf

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to AWS_Meetup_BLR_July_22_Social.pdf

Similar to AWS_Meetup_BLR_July_22_Social.pdf (20)

Recently uploaded

Recently uploaded (20)

AWS_Meetup_BLR_July_22_Social.pdf