SlideShare a Scribd company logo
1 of 26
Local Applications of Large Language
Models based on RAG (Retrieval
Augmented Generation)
——Local Documents Q&A
Luo Weizhi
1. Large language modeling
2. Key structures of model transformers
3. Advantages of comparing RNN networks
4. Large language models llama2
5. The fine-tuning process for quiz datasets.
6. Langchain and chaining concepts
7. RAG (Retrieval Augmented Generation)
8. Demonstration of the project
9. Conclusion
Large language modeling
01
LLM
A large language model (LLM) is a language model
notable for its ability to achieve general-purpose
language generation and other natural language
processing tasks(GPT-4).
The reason why LLM is said to be large is because
the size of the relevant text dataset used for training
is very large. A model of size 7b (7 billion
parameters) is the smallest size of LLM. But it has a
very large dataset.LLMs acquire these abilities by
learning statistical relationships from text
documents during a computationally intensive self-
supervised and semi-supervised training process.
LLMs can be used for text generation, a form of
generative AI, by taking an input text and repeatedly
predicting the next token or word.
LLM
Fg.1.Status of development of LLM((having a size larger than 10B)
Key structures of model
transformers
02
Transformer
Current LLMs are designed based on transformer's
network architecture.It’s a new Encoder-Decoder
structure, encoder with multi-head attention
mechanism and feed-forward network, decoder with
extra mask part
Self-Attention: The core of Transformer enables the
model to take into account the interactions and
dependencies between the elements of the sequence
when processing the sequence data. Self-attention
allows the model to dynamically focus on different
parts of the input sequence as it generates each output,
which is critical to understanding the context and
meaning of the text.
Positional Encoding: Since the Transformer is
entirely based on the attention mechanism and lacks
the ability to deal with sequence order, Positional
Encoding provides information about the position of
individual elements in a sequence by adding
additional information to the input elements.
Transformer
Multi-Head Attention: The attention mechanism is
decomposed into multiple "heads", each of which
learns information from a different representation
subspace, which allows the model to capture data
features from multiple perspectives at the same time.
Feed-Forward Networks: In each Transformer
block, the output of the self-attention layer is passed
to a feed-forward network, which is the same for
each position, but is applied independently at
different positions.
Transformer
Self-Attention mechanism
The self-attention mechanism allows the model to capture contextual relationships within a sequence
by taking into account other elements in the sequence as each element of the sequence is processed.
The mathematical expression for self-attention is:
 Q,K,V are the Query, Key, and Value matrices, respectively, which are obtained by multiplying the
embedding vectors of the input sequence with three different weight matrices.
 dk is the dimension of the key vector, which is used to scale the dot product to prevent the dot product from
being too large and causing the softmax function to be in the saturation region, thus affecting the
backpropagation of the gradient.
 QKT denotes the dot product of the query and key, which is used to compute the similarity between the
positions in the input sequence.
 The softmax function is used to convert the similarity into weights.
Transformer
Multi-Head Attention
The multi-head attention mechanism divides self-attention into multiple "heads".In layman's terms,
it's better to get 8 different people to do the same thing than just 1 person. Each of which captures
information in a different representation subspace:
 Wi
Q ,Wi
K,Wi
V,WO is the trainable weight matrix.
 h is the number of heads.
 The information in different representation subspaces can be fused by concatenating the outputs of different
heads and multiplying them by the output weight matrix WO.
Transformer
Position-wise Feed-Forward Networks
A position feed-forward network follows each attention layer, which performs independent linear
transformations on the representation of each position:
 This is a two-layer fully-connected feed-forward network where the max(0,x) represents the ReLU
activation function.
 W1,W2 and b1,b2 are the weights and biases of the network.
Transformer
Output Layer
Ultimately, the transformer generates predictions for each element of the output sequence using the
linear and soft-max layers:
 Here X is the output of the last decoder layer.
 W and b are the weights and biases of the output layer.
Transformer in LLM
web : https://bbycroft.net/llm
Transformer in LLM
Advantages of comparing RNN
networks
03
Limitations of RNN models
Processes language sequentially in a left-to-right or right-to-left manner. Reading one word at a time forces the RNN
to perform multiple steps to make decisions based on words that are far from each other. The more such steps
required to make decisions, the harder it is for the recurrent network to learn how to make those decisions. That is,
the depth of the network does not match the number of words and learning is difficult.
• Gradient extinction, explosion problem, etc.
Practically impossible to express entire sentences in terms of -vectors.
• Difficult to express complex structures such as sequential information
Probability of each word
Hidden layer of RNN
Word Embedding Layers
For all time steps t of
y1, ... ,yt the last word
in the sequence,
corresponding to a
vector will be fed into
the model
Large language models Llama2
04
Llama 2
llama2 is the second generation of
large language models introduced
by Facebook's ai lab-Meta. It has 4
versions 7b,13b,34b,70b.
Depending on my hardware, we
will download and fine tune its
original 7b model. Although 7b is
the smallest model in terms of
volume, it has about 7 billion
tunable weights and bias
parameters. These parameters are
learned from large amounts of
textual data during the training of
the model to capture and model
language complexity, contextual
relationships, and subtle patterns
in language use.
We can download many free base models in:https://huggingface.co/
Llama 2
Then this process gives the base model a generalized prediction capability
The fine-tuning process for quiz
datasets.
05
Fine-tuning
LLMs are pretrained on an extensive corpus of text. In the case of Llama 2, we know very little about the
composition of the training set, besides its length of 2 trillion tokens. In comparison, BERT (2018) was “only”
trained on the BookCorpus (800M words) and English Wikipedia (2,500M words). From experience, this is a very
costly and long process with a lot of hardware issues.
When the pretraining is complete, auto-regressive models like Llama 2 can predict the next token in a sequence.
However, this does not make them particularly useful assistants since they don’t reply to instructions. This is why
we employ instruction tuning to align their answers with what our project expect.
Fine-tuning
There are two main mainstream fine-tuning techniques:
Supervised fine-tuning (SFT): Trains the model on the dataset of instructions and answers. It minimizes
the difference between generated answers and ground truth answers as labels by adjusting the weights in
the LLM.
Reinforcement Learning with Human Feedback (RLHF): The model learns by interacting with the
environment and receiving feedback. The model is trained to maximize the reward signal (using PPO),
which usually comes from human evaluation of the model output.
In general, RLHF has been shown to capture more complex and nuanced human preferences, but it is also
more challenging to implement effectively. Indeed, the process is systematic, as well as voluminous.
Thus, in my project we will be implementing SFT, but this raises the question: do we need to know why
fine-tuning works in the first place? As emphasized in the Orca[1] paper, my understanding is that fine-
tuning leverages the knowledge learned during the pre-training process. In other words, if the model has
never seen the type of data we are interested in, then fine-tuning will not help.
[1]:Subhabrata et al., Orca: Progressive Learning from Complex Explanation Traces of GPT-4
Fine-tuning
Fine-tuning
For our hardware conditions: the RAM is 16gb,and
Llama 2-7b weights (in FP16, 7b × 2 bytes = 14
GB)!
First, we have to load our defined dataset. Here,
our dataset has been preprocessed, but typically, we
can reformat cues, filter out error text, merge
multiple datasets, and so on.
Then, we configure bitsandbytes4 bit quantization.
Next, we load the Llama 2 model with 4-bit
precision on the GPU using the appropriate splitter.
Finally, we load the configuration of QLoRA, the
general training parameters, and pass everything to
SFTTrainer.Training
Fine-tuning
For this nearly 4-hour
fine-tuning process, we
ensured that the model's
fine-tuning behavior was
correct.!
Still Working...

More Related Content

Similar to Local Applications of Large Language Models based on RAG.pptx

Arabic named entity recognition using deep learning approach
Arabic named entity recognition using deep learning approachArabic named entity recognition using deep learning approach
Arabic named entity recognition using deep learning approachIJECEIAES
 
Implementation Of Back-Propagation Neural Network For Isolated Bangla Speech ...
Implementation Of Back-Propagation Neural Network For Isolated Bangla Speech ...Implementation Of Back-Propagation Neural Network For Isolated Bangla Speech ...
Implementation Of Back-Propagation Neural Network For Isolated Bangla Speech ...ijistjournal
 
Implementation Of Back-Propagation Neural Network For Isolated Bangla Speech ...
Implementation Of Back-Propagation Neural Network For Isolated Bangla Speech ...Implementation Of Back-Propagation Neural Network For Isolated Bangla Speech ...
Implementation Of Back-Propagation Neural Network For Isolated Bangla Speech ...ijistjournal
 
PSO-based Training, Pruning, and Ensembling of Extreme Learning Machine RBF N...
PSO-based Training, Pruning, and Ensembling of Extreme Learning Machine RBF N...PSO-based Training, Pruning, and Ensembling of Extreme Learning Machine RBF N...
PSO-based Training, Pruning, and Ensembling of Extreme Learning Machine RBF N...ijceronline
 
Analysis of the evolution of advanced transformer-based language models: Expe...
Analysis of the evolution of advanced transformer-based language models: Expe...Analysis of the evolution of advanced transformer-based language models: Expe...
Analysis of the evolution of advanced transformer-based language models: Expe...IAESIJAI
 
Performance Comparison between Pytorch and Mindspore
Performance Comparison between Pytorch and MindsporePerformance Comparison between Pytorch and Mindspore
Performance Comparison between Pytorch and Mindsporeijdms
 
DL for sentence classification project Write-up
DL for sentence classification project Write-upDL for sentence classification project Write-up
DL for sentence classification project Write-upHoàng Triều Trịnh
 
mapReduce for machine learning
mapReduce for machine learning mapReduce for machine learning
mapReduce for machine learning Pranya Prabhakar
 
SENTIMENT ANALYSIS IN MYANMAR LANGUAGE USING CONVOLUTIONAL LSTM NEURAL NETWORK
SENTIMENT ANALYSIS IN MYANMAR LANGUAGE USING CONVOLUTIONAL LSTM NEURAL NETWORKSENTIMENT ANALYSIS IN MYANMAR LANGUAGE USING CONVOLUTIONAL LSTM NEURAL NETWORK
SENTIMENT ANALYSIS IN MYANMAR LANGUAGE USING CONVOLUTIONAL LSTM NEURAL NETWORKijnlc
 
Sentiment Analysis In Myanmar Language Using Convolutional Lstm Neural Network
Sentiment Analysis In Myanmar Language Using Convolutional Lstm Neural NetworkSentiment Analysis In Myanmar Language Using Convolutional Lstm Neural Network
Sentiment Analysis In Myanmar Language Using Convolutional Lstm Neural Networkkevig
 
Myanmar Alphabet Recognition System Based on Artificial Neural Network
Myanmar Alphabet Recognition System Based on Artificial Neural NetworkMyanmar Alphabet Recognition System Based on Artificial Neural Network
Myanmar Alphabet Recognition System Based on Artificial Neural Networkijtsrd
 
Thomas Wolf "Transfer learning in NLP"
Thomas Wolf "Transfer learning in NLP"Thomas Wolf "Transfer learning in NLP"
Thomas Wolf "Transfer learning in NLP"Fwdays
 
Feature Extraction and Analysis of Natural Language Processing for Deep Learn...
Feature Extraction and Analysis of Natural Language Processing for Deep Learn...Feature Extraction and Analysis of Natural Language Processing for Deep Learn...
Feature Extraction and Analysis of Natural Language Processing for Deep Learn...Sharmila Sathish
 
BloombergGPT.pdfA Large Language Model for Finance
BloombergGPT.pdfA Large Language Model for FinanceBloombergGPT.pdfA Large Language Model for Finance
BloombergGPT.pdfA Large Language Model for Finance957671457
 
Adaptive Training of Radial Basis Function Networks Based on Cooperative
Adaptive Training of Radial Basis Function Networks Based on CooperativeAdaptive Training of Radial Basis Function Networks Based on Cooperative
Adaptive Training of Radial Basis Function Networks Based on CooperativeESCOM
 
NS-CUK Seminar: S.T.Nguyen, Review on "Do We Really Need Complicated Model Ar...
NS-CUK Seminar: S.T.Nguyen, Review on "Do We Really Need Complicated Model Ar...NS-CUK Seminar: S.T.Nguyen, Review on "Do We Really Need Complicated Model Ar...
NS-CUK Seminar: S.T.Nguyen, Review on "Do We Really Need Complicated Model Ar...ssuser4b1f48
 
Communication systems-theory-for-undergraduate-students-using-matlab
Communication systems-theory-for-undergraduate-students-using-matlabCommunication systems-theory-for-undergraduate-students-using-matlab
Communication systems-theory-for-undergraduate-students-using-matlabSaifAbdulNabi1
 

Similar to Local Applications of Large Language Models based on RAG.pptx (20)

Arabic named entity recognition using deep learning approach
Arabic named entity recognition using deep learning approachArabic named entity recognition using deep learning approach
Arabic named entity recognition using deep learning approach
 
Marvin_Capstone
Marvin_CapstoneMarvin_Capstone
Marvin_Capstone
 
Implementation Of Back-Propagation Neural Network For Isolated Bangla Speech ...
Implementation Of Back-Propagation Neural Network For Isolated Bangla Speech ...Implementation Of Back-Propagation Neural Network For Isolated Bangla Speech ...
Implementation Of Back-Propagation Neural Network For Isolated Bangla Speech ...
 
Implementation Of Back-Propagation Neural Network For Isolated Bangla Speech ...
Implementation Of Back-Propagation Neural Network For Isolated Bangla Speech ...Implementation Of Back-Propagation Neural Network For Isolated Bangla Speech ...
Implementation Of Back-Propagation Neural Network For Isolated Bangla Speech ...
 
PSO-based Training, Pruning, and Ensembling of Extreme Learning Machine RBF N...
PSO-based Training, Pruning, and Ensembling of Extreme Learning Machine RBF N...PSO-based Training, Pruning, and Ensembling of Extreme Learning Machine RBF N...
PSO-based Training, Pruning, and Ensembling of Extreme Learning Machine RBF N...
 
Analysis of the evolution of advanced transformer-based language models: Expe...
Analysis of the evolution of advanced transformer-based language models: Expe...Analysis of the evolution of advanced transformer-based language models: Expe...
Analysis of the evolution of advanced transformer-based language models: Expe...
 
Performance Comparison between Pytorch and Mindspore
Performance Comparison between Pytorch and MindsporePerformance Comparison between Pytorch and Mindspore
Performance Comparison between Pytorch and Mindspore
 
DL for sentence classification project Write-up
DL for sentence classification project Write-upDL for sentence classification project Write-up
DL for sentence classification project Write-up
 
Oop
OopOop
Oop
 
mapReduce for machine learning
mapReduce for machine learning mapReduce for machine learning
mapReduce for machine learning
 
SENTIMENT ANALYSIS IN MYANMAR LANGUAGE USING CONVOLUTIONAL LSTM NEURAL NETWORK
SENTIMENT ANALYSIS IN MYANMAR LANGUAGE USING CONVOLUTIONAL LSTM NEURAL NETWORKSENTIMENT ANALYSIS IN MYANMAR LANGUAGE USING CONVOLUTIONAL LSTM NEURAL NETWORK
SENTIMENT ANALYSIS IN MYANMAR LANGUAGE USING CONVOLUTIONAL LSTM NEURAL NETWORK
 
Sentiment Analysis In Myanmar Language Using Convolutional Lstm Neural Network
Sentiment Analysis In Myanmar Language Using Convolutional Lstm Neural NetworkSentiment Analysis In Myanmar Language Using Convolutional Lstm Neural Network
Sentiment Analysis In Myanmar Language Using Convolutional Lstm Neural Network
 
Myanmar Alphabet Recognition System Based on Artificial Neural Network
Myanmar Alphabet Recognition System Based on Artificial Neural NetworkMyanmar Alphabet Recognition System Based on Artificial Neural Network
Myanmar Alphabet Recognition System Based on Artificial Neural Network
 
Thomas Wolf "Transfer learning in NLP"
Thomas Wolf "Transfer learning in NLP"Thomas Wolf "Transfer learning in NLP"
Thomas Wolf "Transfer learning in NLP"
 
Feature Extraction and Analysis of Natural Language Processing for Deep Learn...
Feature Extraction and Analysis of Natural Language Processing for Deep Learn...Feature Extraction and Analysis of Natural Language Processing for Deep Learn...
Feature Extraction and Analysis of Natural Language Processing for Deep Learn...
 
BloombergGPT.pdfA Large Language Model for Finance
BloombergGPT.pdfA Large Language Model for FinanceBloombergGPT.pdfA Large Language Model for Finance
BloombergGPT.pdfA Large Language Model for Finance
 
rbm_hls
rbm_hlsrbm_hls
rbm_hls
 
Adaptive Training of Radial Basis Function Networks Based on Cooperative
Adaptive Training of Radial Basis Function Networks Based on CooperativeAdaptive Training of Radial Basis Function Networks Based on Cooperative
Adaptive Training of Radial Basis Function Networks Based on Cooperative
 
NS-CUK Seminar: S.T.Nguyen, Review on "Do We Really Need Complicated Model Ar...
NS-CUK Seminar: S.T.Nguyen, Review on "Do We Really Need Complicated Model Ar...NS-CUK Seminar: S.T.Nguyen, Review on "Do We Really Need Complicated Model Ar...
NS-CUK Seminar: S.T.Nguyen, Review on "Do We Really Need Complicated Model Ar...
 
Communication systems-theory-for-undergraduate-students-using-matlab
Communication systems-theory-for-undergraduate-students-using-matlabCommunication systems-theory-for-undergraduate-students-using-matlab
Communication systems-theory-for-undergraduate-students-using-matlab
 

Recently uploaded

08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking MenDelhi Call girls
 
Benefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other FrameworksBenefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other FrameworksSoftradix Technologies
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountPuma Security, LLC
 
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Scott Keck-Warren
 
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure serviceWhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure servicePooja Nehwal
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsMark Billinghurst
 
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersEnhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersThousandEyes
 
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 3652toLead Limited
 
Key Features Of Token Development (1).pptx
Key  Features Of Token  Development (1).pptxKey  Features Of Token  Development (1).pptx
Key Features Of Token Development (1).pptxLBM Solutions
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonetsnaman860154
 
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024BookNet Canada
 
AI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsAI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsMemoori
 
SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024Scott Keck-Warren
 
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Alan Dix
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationMichael W. Hawkins
 
How to Remove Document Management Hurdles with X-Docs?
How to Remove Document Management Hurdles with X-Docs?How to Remove Document Management Hurdles with X-Docs?
How to Remove Document Management Hurdles with X-Docs?XfilesPro
 
Azure Monitor & Application Insight to monitor Infrastructure & Application
Azure Monitor & Application Insight to monitor Infrastructure & ApplicationAzure Monitor & Application Insight to monitor Infrastructure & Application
Azure Monitor & Application Insight to monitor Infrastructure & ApplicationAndikSusilo4
 
Making_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptx
Making_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptxMaking_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptx
Making_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptxnull - The Open Security Community
 
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticsKotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticscarlostorres15106
 
Pigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food ManufacturingPigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food ManufacturingPigging Solutions
 

Recently uploaded (20)

08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men
 
Benefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other FrameworksBenefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other Frameworks
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path Mount
 
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024
 
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure serviceWhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR Systems
 
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersEnhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
 
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
 
Key Features Of Token Development (1).pptx
Key  Features Of Token  Development (1).pptxKey  Features Of Token  Development (1).pptx
Key Features Of Token Development (1).pptx
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonets
 
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
 
AI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsAI as an Interface for Commercial Buildings
AI as an Interface for Commercial Buildings
 
SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024
 
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 
How to Remove Document Management Hurdles with X-Docs?
How to Remove Document Management Hurdles with X-Docs?How to Remove Document Management Hurdles with X-Docs?
How to Remove Document Management Hurdles with X-Docs?
 
Azure Monitor & Application Insight to monitor Infrastructure & Application
Azure Monitor & Application Insight to monitor Infrastructure & ApplicationAzure Monitor & Application Insight to monitor Infrastructure & Application
Azure Monitor & Application Insight to monitor Infrastructure & Application
 
Making_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptx
Making_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptxMaking_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptx
Making_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptx
 
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticsKotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
 
Pigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food ManufacturingPigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food Manufacturing
 

Local Applications of Large Language Models based on RAG.pptx

  • 1. Local Applications of Large Language Models based on RAG (Retrieval Augmented Generation) ——Local Documents Q&A Luo Weizhi
  • 2. 1. Large language modeling 2. Key structures of model transformers 3. Advantages of comparing RNN networks 4. Large language models llama2 5. The fine-tuning process for quiz datasets. 6. Langchain and chaining concepts 7. RAG (Retrieval Augmented Generation) 8. Demonstration of the project 9. Conclusion
  • 4. LLM A large language model (LLM) is a language model notable for its ability to achieve general-purpose language generation and other natural language processing tasks(GPT-4). The reason why LLM is said to be large is because the size of the relevant text dataset used for training is very large. A model of size 7b (7 billion parameters) is the smallest size of LLM. But it has a very large dataset.LLMs acquire these abilities by learning statistical relationships from text documents during a computationally intensive self- supervised and semi-supervised training process. LLMs can be used for text generation, a form of generative AI, by taking an input text and repeatedly predicting the next token or word.
  • 5. LLM Fg.1.Status of development of LLM((having a size larger than 10B)
  • 6. Key structures of model transformers 02
  • 7. Transformer Current LLMs are designed based on transformer's network architecture.It’s a new Encoder-Decoder structure, encoder with multi-head attention mechanism and feed-forward network, decoder with extra mask part Self-Attention: The core of Transformer enables the model to take into account the interactions and dependencies between the elements of the sequence when processing the sequence data. Self-attention allows the model to dynamically focus on different parts of the input sequence as it generates each output, which is critical to understanding the context and meaning of the text. Positional Encoding: Since the Transformer is entirely based on the attention mechanism and lacks the ability to deal with sequence order, Positional Encoding provides information about the position of individual elements in a sequence by adding additional information to the input elements.
  • 8. Transformer Multi-Head Attention: The attention mechanism is decomposed into multiple "heads", each of which learns information from a different representation subspace, which allows the model to capture data features from multiple perspectives at the same time. Feed-Forward Networks: In each Transformer block, the output of the self-attention layer is passed to a feed-forward network, which is the same for each position, but is applied independently at different positions.
  • 9. Transformer Self-Attention mechanism The self-attention mechanism allows the model to capture contextual relationships within a sequence by taking into account other elements in the sequence as each element of the sequence is processed. The mathematical expression for self-attention is:  Q,K,V are the Query, Key, and Value matrices, respectively, which are obtained by multiplying the embedding vectors of the input sequence with three different weight matrices.  dk is the dimension of the key vector, which is used to scale the dot product to prevent the dot product from being too large and causing the softmax function to be in the saturation region, thus affecting the backpropagation of the gradient.  QKT denotes the dot product of the query and key, which is used to compute the similarity between the positions in the input sequence.  The softmax function is used to convert the similarity into weights.
  • 10. Transformer Multi-Head Attention The multi-head attention mechanism divides self-attention into multiple "heads".In layman's terms, it's better to get 8 different people to do the same thing than just 1 person. Each of which captures information in a different representation subspace:  Wi Q ,Wi K,Wi V,WO is the trainable weight matrix.  h is the number of heads.  The information in different representation subspaces can be fused by concatenating the outputs of different heads and multiplying them by the output weight matrix WO.
  • 11. Transformer Position-wise Feed-Forward Networks A position feed-forward network follows each attention layer, which performs independent linear transformations on the representation of each position:  This is a two-layer fully-connected feed-forward network where the max(0,x) represents the ReLU activation function.  W1,W2 and b1,b2 are the weights and biases of the network.
  • 12. Transformer Output Layer Ultimately, the transformer generates predictions for each element of the output sequence using the linear and soft-max layers:  Here X is the output of the last decoder layer.  W and b are the weights and biases of the output layer.
  • 13. Transformer in LLM web : https://bbycroft.net/llm
  • 15. Advantages of comparing RNN networks 03
  • 16. Limitations of RNN models Processes language sequentially in a left-to-right or right-to-left manner. Reading one word at a time forces the RNN to perform multiple steps to make decisions based on words that are far from each other. The more such steps required to make decisions, the harder it is for the recurrent network to learn how to make those decisions. That is, the depth of the network does not match the number of words and learning is difficult. • Gradient extinction, explosion problem, etc. Practically impossible to express entire sentences in terms of -vectors. • Difficult to express complex structures such as sequential information Probability of each word Hidden layer of RNN Word Embedding Layers For all time steps t of y1, ... ,yt the last word in the sequence, corresponding to a vector will be fed into the model
  • 18. Llama 2 llama2 is the second generation of large language models introduced by Facebook's ai lab-Meta. It has 4 versions 7b,13b,34b,70b. Depending on my hardware, we will download and fine tune its original 7b model. Although 7b is the smallest model in terms of volume, it has about 7 billion tunable weights and bias parameters. These parameters are learned from large amounts of textual data during the training of the model to capture and model language complexity, contextual relationships, and subtle patterns in language use. We can download many free base models in:https://huggingface.co/
  • 19. Llama 2 Then this process gives the base model a generalized prediction capability
  • 20. The fine-tuning process for quiz datasets. 05
  • 21. Fine-tuning LLMs are pretrained on an extensive corpus of text. In the case of Llama 2, we know very little about the composition of the training set, besides its length of 2 trillion tokens. In comparison, BERT (2018) was “only” trained on the BookCorpus (800M words) and English Wikipedia (2,500M words). From experience, this is a very costly and long process with a lot of hardware issues. When the pretraining is complete, auto-regressive models like Llama 2 can predict the next token in a sequence. However, this does not make them particularly useful assistants since they don’t reply to instructions. This is why we employ instruction tuning to align their answers with what our project expect.
  • 22. Fine-tuning There are two main mainstream fine-tuning techniques: Supervised fine-tuning (SFT): Trains the model on the dataset of instructions and answers. It minimizes the difference between generated answers and ground truth answers as labels by adjusting the weights in the LLM. Reinforcement Learning with Human Feedback (RLHF): The model learns by interacting with the environment and receiving feedback. The model is trained to maximize the reward signal (using PPO), which usually comes from human evaluation of the model output. In general, RLHF has been shown to capture more complex and nuanced human preferences, but it is also more challenging to implement effectively. Indeed, the process is systematic, as well as voluminous. Thus, in my project we will be implementing SFT, but this raises the question: do we need to know why fine-tuning works in the first place? As emphasized in the Orca[1] paper, my understanding is that fine- tuning leverages the knowledge learned during the pre-training process. In other words, if the model has never seen the type of data we are interested in, then fine-tuning will not help. [1]:Subhabrata et al., Orca: Progressive Learning from Complex Explanation Traces of GPT-4
  • 24. Fine-tuning For our hardware conditions: the RAM is 16gb,and Llama 2-7b weights (in FP16, 7b × 2 bytes = 14 GB)! First, we have to load our defined dataset. Here, our dataset has been preprocessed, but typically, we can reformat cues, filter out error text, merge multiple datasets, and so on. Then, we configure bitsandbytes4 bit quantization. Next, we load the Llama 2 model with 4-bit precision on the GPU using the appropriate splitter. Finally, we load the configuration of QLoRA, the general training parameters, and pass everything to SFTTrainer.Training
  • 25. Fine-tuning For this nearly 4-hour fine-tuning process, we ensured that the model's fine-tuning behavior was correct.!