Version 1.0
LLM Fine Tuning with QLoRA -
Evaluation vs RAG
Comparing our fine-tuned Llama 2 model to using Retrieval
Augmented Generation alongside base Llama 2. Evaluated
using similar statistical measures the we used previously.
Obioma Anomnachi
Engineer @ Anant
RAG Overview
● What is Retrieval-Augmented Generation (RAG)?
○ Hybrid NLP Approach:
■ Combines information retrieval and text generation.
■ Creates more comprehensive and contextually accurate outputs.
○ Uses External Knowledge Sources:
■ Leverages large corpora or databases.
■ Augments generative capabilities of language models.
● How RAG Works:
○ Retrieval Stage:
■ Model retrieves relevant information from a pre-existing corpus or
knowledge base.
○ Generation Stage:
■ Uses retrieved information as input.
■ Generates a coherent and contextually appropriate response.
● Produces more informed and accurate results.
● Especially effective for complex tasks requiring in-depth knowledge.
● Traditional Language Models:
○ Data Dependency:
■ Rely solely on the data they were
trained on.
○ Text Generation:
■ Generate high-quality text based on
learned patterns.
○ Limitations:
■ Struggle with tasks requiring up-to-
date information.
■ May lack specific factual knowledge
not present in training data.
● RAG Models:
○ Enhanced Generative Process:
■ Incorporate real-time information
retrieval.
○ Dynamic Information Retrieval:
■ Fetch and utilize the most relevant
information available at the time of
generation.
○ Improved Performance:
■ Significantly better at tasks
requiring recent, detailed, or
domain-specific information.
RAG vs Language Models
RAG Components
Retrievers
● Knowledge Sources
○ External Corpora:
■ Large datasets, databases, and documents.
○ Domain-Specific Databases:
■ Specialized knowledge bases tailored to specific fields (e.g., medical, legal).
○ Real-Time Data:
■ Up-to-date information from live sources such as news feeds or databases.
● Search Mechanisms
○ Dense Vector Representations:
■ Utilize neural embeddings to find semantically similar documents.
○ Sparse Vector Representations:
■ Use traditional methods like TF-IDF or BM25 to retrieve relevant passages.
○ Hybrid Techniques:
■ Combine dense and sparse methods for more accurate retrieval.
○ Relevance Scoring:
■ Assign scores to documents based on relevance to the query.
○ Filtering and Ranking:
■ Select and rank the most pertinent information for generation.
Retrievers - Embeddings and Similarity Search
● What are Neural Embeddings?
○ Definition:
■ Neural embeddings are dense vector representations of words, phrases, sentences, or documents,
generated using neural network models.
■ They capture semantic meaning in a continuous vector space where similar items are placed closer
together.
○ Purpose:
■ Semantic Similarity:
● Encodes semantic information, making it easier to measure similarity between different
pieces of text.
● Allows models to understand and retrieve information based on meaning, not just exact word
matching.
○ Output:
■ Generates dense vectors (embeddings) with fixed dimensions, typically high-dimensional (e.g., 300,
768).
RAG Advantages
● Enhanced Accuracy:
○ Incorporation of External Knowledge:
■ Leverages up-to-date and domain-specific information.
● Improved Factuality:
○ Accesses and integrates verified data sources.
■ Reduces the risk of generating incorrect or outdated information.
● Increased Relevance:
○ Context-Aware Responses:
■ Dynamic retrieval of pertinent information based on the query.
■ Ensures responses are highly relevant to the user's needs.
○ Domain-Specific Expertise:
■ Customizable to access specialized knowledge bases (e.g., medical, legal).
○ Real-Time Information:
■ Capable of retrieving the latest data, adapting to changes and new developments.
■ Useful for applications requiring up-to-date information, like news or trend analysis.
● Versatile Applications:
○ Adapts to various tasks such as question answering, summarization, and conversational agents.
RAG
● Enhanced Accuracy and Relevance:
○ Incorporates up-to-date, domain-specific
information dynamically.
○ Provides contextually relevant responses
leveraging real-time data retrieval.
● Scalability and Flexibility:
○ Adaptable to various tasks without the
need for extensive retraining.
○ Easy to update knowledge base for
different domains or new information.
● Cost Efficiency:
○ Reduces the need for large-scale dataset
creation and extensive retraining.
○ Utilizes existing knowledge sources,
lowering computational and resource
expenses.
Fine Tuning
● Customization and Specialization:
○ Tailors the model to specific tasks or
domains
○ Results in highly specialized models fine-
tuned to particular use cases.
● Improved Performance for Specific Tasks:
○ Fine-tuning on curated datasets produces
models optimized for particular
applications.
○ Enhances performance in narrow domains
with specialized requirements.
● Control Over Output:
○ Fine-grained adjustments to the model
improve accuracy and reduce errors.
○ Allows for better control over generated
content style.
RAG vs Fine Tuning
Evaluation
● Because the answer is ultimately generated via LLM, the performance of a RAG model is evaluated
the same way as for LLMs, fine tuned or not.
● Domain specific tests, benchmarks, statistical measures, human and llm evaluation all work the
same as in the previous presentation.
● Performance will depend on the sophistication of the retriever mechanism as well as the
capabilities of the LLM used, and the the quality of the data backing it.
Demo
Strategy: Scalable Fast Data
Architecture: Cassandra, Spark, Kafka
Engineering: Node, Python, JVM,CLR
Operations: Cloud, Container
Rescue: Downtime!! I need help.
www.anant.us | solutions@anant.us | (855) 262-6826
3 Washington Circle, NW | Suite 301 | Washington, DC 20037

LLM Fine Tuning with QLoRA Cassandra Lunch 4, presented by Anant

  • 1.
    Version 1.0 LLM FineTuning with QLoRA - Evaluation vs RAG Comparing our fine-tuned Llama 2 model to using Retrieval Augmented Generation alongside base Llama 2. Evaluated using similar statistical measures the we used previously. Obioma Anomnachi Engineer @ Anant
  • 2.
    RAG Overview ● Whatis Retrieval-Augmented Generation (RAG)? ○ Hybrid NLP Approach: ■ Combines information retrieval and text generation. ■ Creates more comprehensive and contextually accurate outputs. ○ Uses External Knowledge Sources: ■ Leverages large corpora or databases. ■ Augments generative capabilities of language models. ● How RAG Works: ○ Retrieval Stage: ■ Model retrieves relevant information from a pre-existing corpus or knowledge base. ○ Generation Stage: ■ Uses retrieved information as input. ■ Generates a coherent and contextually appropriate response. ● Produces more informed and accurate results. ● Especially effective for complex tasks requiring in-depth knowledge.
  • 3.
    ● Traditional LanguageModels: ○ Data Dependency: ■ Rely solely on the data they were trained on. ○ Text Generation: ■ Generate high-quality text based on learned patterns. ○ Limitations: ■ Struggle with tasks requiring up-to- date information. ■ May lack specific factual knowledge not present in training data. ● RAG Models: ○ Enhanced Generative Process: ■ Incorporate real-time information retrieval. ○ Dynamic Information Retrieval: ■ Fetch and utilize the most relevant information available at the time of generation. ○ Improved Performance: ■ Significantly better at tasks requiring recent, detailed, or domain-specific information. RAG vs Language Models
  • 4.
  • 5.
    Retrievers ● Knowledge Sources ○External Corpora: ■ Large datasets, databases, and documents. ○ Domain-Specific Databases: ■ Specialized knowledge bases tailored to specific fields (e.g., medical, legal). ○ Real-Time Data: ■ Up-to-date information from live sources such as news feeds or databases. ● Search Mechanisms ○ Dense Vector Representations: ■ Utilize neural embeddings to find semantically similar documents. ○ Sparse Vector Representations: ■ Use traditional methods like TF-IDF or BM25 to retrieve relevant passages. ○ Hybrid Techniques: ■ Combine dense and sparse methods for more accurate retrieval. ○ Relevance Scoring: ■ Assign scores to documents based on relevance to the query. ○ Filtering and Ranking: ■ Select and rank the most pertinent information for generation.
  • 6.
    Retrievers - Embeddingsand Similarity Search ● What are Neural Embeddings? ○ Definition: ■ Neural embeddings are dense vector representations of words, phrases, sentences, or documents, generated using neural network models. ■ They capture semantic meaning in a continuous vector space where similar items are placed closer together. ○ Purpose: ■ Semantic Similarity: ● Encodes semantic information, making it easier to measure similarity between different pieces of text. ● Allows models to understand and retrieve information based on meaning, not just exact word matching. ○ Output: ■ Generates dense vectors (embeddings) with fixed dimensions, typically high-dimensional (e.g., 300, 768).
  • 7.
    RAG Advantages ● EnhancedAccuracy: ○ Incorporation of External Knowledge: ■ Leverages up-to-date and domain-specific information. ● Improved Factuality: ○ Accesses and integrates verified data sources. ■ Reduces the risk of generating incorrect or outdated information. ● Increased Relevance: ○ Context-Aware Responses: ■ Dynamic retrieval of pertinent information based on the query. ■ Ensures responses are highly relevant to the user's needs. ○ Domain-Specific Expertise: ■ Customizable to access specialized knowledge bases (e.g., medical, legal). ○ Real-Time Information: ■ Capable of retrieving the latest data, adapting to changes and new developments. ■ Useful for applications requiring up-to-date information, like news or trend analysis. ● Versatile Applications: ○ Adapts to various tasks such as question answering, summarization, and conversational agents.
  • 8.
    RAG ● Enhanced Accuracyand Relevance: ○ Incorporates up-to-date, domain-specific information dynamically. ○ Provides contextually relevant responses leveraging real-time data retrieval. ● Scalability and Flexibility: ○ Adaptable to various tasks without the need for extensive retraining. ○ Easy to update knowledge base for different domains or new information. ● Cost Efficiency: ○ Reduces the need for large-scale dataset creation and extensive retraining. ○ Utilizes existing knowledge sources, lowering computational and resource expenses. Fine Tuning ● Customization and Specialization: ○ Tailors the model to specific tasks or domains ○ Results in highly specialized models fine- tuned to particular use cases. ● Improved Performance for Specific Tasks: ○ Fine-tuning on curated datasets produces models optimized for particular applications. ○ Enhances performance in narrow domains with specialized requirements. ● Control Over Output: ○ Fine-grained adjustments to the model improve accuracy and reduce errors. ○ Allows for better control over generated content style. RAG vs Fine Tuning
  • 9.
    Evaluation ● Because theanswer is ultimately generated via LLM, the performance of a RAG model is evaluated the same way as for LLMs, fine tuned or not. ● Domain specific tests, benchmarks, statistical measures, human and llm evaluation all work the same as in the previous presentation. ● Performance will depend on the sophistication of the retriever mechanism as well as the capabilities of the LLM used, and the the quality of the data backing it.
  • 10.
  • 11.
    Strategy: Scalable FastData Architecture: Cassandra, Spark, Kafka Engineering: Node, Python, JVM,CLR Operations: Cloud, Container Rescue: Downtime!! I need help. www.anant.us | solutions@anant.us | (855) 262-6826 3 Washington Circle, NW | Suite 301 | Washington, DC 20037