SlideShare a Scribd company logo
1 of 27
Download to read offline
1/27
Fine-tuning Llama 2: An overview
leewayhertz.com/fine-tuning-llama2/
In the dynamic and ever-evolving field of generative AI, a profound sense of competition has taken root,
fueled by a relentless quest for innovation and excellence. The introduction of GPT by OpenAI has
prompted various businesses to work on creating their own Large Language Models (LLMs). However,
creating such sophisticated algorithms is like navigating through a maze of complexities. It demands
exhaustive research, a massive amount of relevant data and overcoming numerous other challenges.
Further, the substantial computational power required for these tasks remains a significant hurdle for
many.
Amidst this fiercely competitive landscape, where industry heavyweights like OpenAI and Google have
already etched their indelible marks, a new contender, Meta, entered the arena with their open-source
LLM, Llama, with a goal of democratizing AI. They subsequently upgraded it to Llama 2, which was
trained on 40% more data than its predecessor. While all large language models exhibit remarkable
efficiency, their adaptability to handle domain-specific inquiries, such as those related to a business’s
financial performance or inventory status, may be constrained. To empower these models with domain-
specific competence and elevate their precision, a refinement process called fine-tuning is implemented.
In this article, we will talk about fine-tuning Llama 2, a model that has opened up new avenues for
innovation, research, and commercial applications. This process of fine-tuning may be considered
imperative as it can yield numerous benefits like cost savings, secure management of confidential data,
and the potential to surpass renowned models like GPT-4 in specialized tasks.
2/27
So, let’s dive deeper into the article and explore the transformative power of Llama 2 in redefining the
boundaries of artificial intelligence, creating endless possibilities for businesses.
What is Llama 2?
Why use Llama 2?
Why does Llama 2 matter in the AI landscape?
How does Llama 2 work?
A thorough analysis of Llama 2 in comparison to other leading LLMs
What does fine-tuning an LLM mean?
Techniques for LLM fine-tuning
How can we perform fine-tuning on Llama 2?
PEFT approaches – LoRA and QLoRa
Fine-tuning the Llama 2 model with QLoRA
Challenges in fine-tuning Llama 2
How does LeewayHertz help in building Llama 2 model-powered solutions?
What is Llama 2?
Meta’s recent unveiling of the Llama 2 suite signifies an important milestone in the evolution of LLMs.
Launched in mid-July, Llama 2 emerges as a versatile series of both pre-trained and fine-tuned models,
characterized by its diverse parameter configurations of 7B, 13B, and 70B. This release included
comprehensive papers detailing the intricacies of its design, training, and implementation, offering
invaluable insights into the advancements made in the AI sector.
At the core of Llama 2’s development was an expansive training regimen built upon a staggering 2 trillion
tokens—marking a 40% increase from previous endeavors. Sophisticated architectural interventions such
as the grouped-query attention (GQA) mechanism further amplified this rigorous training. Particularly in
the 70B model, GQA expedites inference, ensuring optimal performance without compromising speed.
Furthermore, the model boasts a default context window of 4096 tokens, a significant advancement from
previous iterations and a testament to its enhanced capability to handle complex contextual information.
Architecturally, Llama 2 distinguishes itself from its peers through several innovative attributes. It
leverages the RMSNorm normalization, SwiGLU activation, and rotatory positional embedding to further
enhance its data processing prowess. Applying the Adam optimizer with a cosine learning rate schedule,
a weight decay of 0.1, and gradient clipping underscores Meta’s commitment to refining even the most
nuanced aspects of model development.
Yet, the true innovation of Llama 2 lies not merely in its architectural and training advancements but in its
fine-tuning strategies. Meta has judiciously prioritized quality over quantity in its Supervised Fine-Tuning
(SFT) phase, a decision inspired by numerous studies indicating the superior model performance
achieved through high-quality data. Complementing this is the Reinforcement Learning with Human
Feedback (RLHF) stage, meticulously designed to calibrate the model in line with user preferences.
Using a comparative approach where annotators evaluate model outputs, the RLHF process refines
Llama 2 to accentuate helpfulness and safety in its responses.
3/27
Furthermore, Llama 2’s commercial adaptability is evident in its open-source and commercial character,
facilitating ease of use and expansion. It’s not merely a static tool; it’s a dynamic solution optimized for
dialogue use cases, as seen in the Llama-2-chat versions available on the Hugging Face platform. While
the models differ in parameter size, their consistent optimization for both speed and accuracy
underscores their adaptability to diverse operational demands.
Overall, Llama 2, as a member of the Llama family of LLMs, not only aligns with the technical prowess of
contemporaries like GPT-3 and PaLM 2 but also introduces several groundbreaking innovations. Its
optimized transformer architecture, rigorous training, fine-tuning procedures, and open-source
accessibility position it as a formidable asset in the AI landscape, promising a future of more accurate,
efficient, and user-aligned AI solutions.
Why use Llama 2?
In today’s AI-driven landscape, responsibility and accountability take center stage. Meta’s Llama 2 is
evidence of this heightened focus on creating AI solutions that are transparent, accountable, and open to
scrutiny. This section delves into why Llama 2’s approach is pivotal in reshaping our understanding and
expectations of AI models.
Open source: The bedrock of transparency
Most LLMs, such as OpenAI’s GPT-3, GPT 4, Google’s PaLM and PaLM 2, and Anthropic’s Claude, have
predominantly been closed source. This limited accessibility restricts the broader research community
from fully understanding these models’ intricacies and decision-making processes. Llama 2 stands in
stark contrast. Being open source enables anyone with relevant technical expertise not just to access but
also to dissect, understand, and potentially modify the model. By enabling people to peruse the research
paper detailing Llama 2’s development and training and even download the model for personal or
business use, Meta is championing an era of transparency in AI.
Ensuring safety through red-teaming
Safety in AI is paramount, and Llama 2’s development process reflects this priority. Through internal and
third-party commissions, adversarial prompts were generated through intensive red-teaming exercises to
facilitate model fine-tuning. These rigorous processes are not just a one-time effort; they signify Meta’s
ongoing commitment to refining model safety iteratively. The intention is clear: ensuring Llama 2 is robust
against unforeseen challenges.
Transparent reporting: An insight into model evaluation
The research paper details Meta’s schematic transparency, highlighting the challenges encountered
during the development of Llama 2. By highlighting known issues and outlining the steps taken to mitigate
them – and those planned for future iterations – Meta is providing an open playbook on the model’s
strengths and areas for improvement.
Empowering developers: “Responsible use guide” and “Acceptable use policy”
4/27
With great power comes great responsibility. Acknowledging LLMs’ vast potential and inherent risks, Meta
has devised a “Responsible Use Guide” to steer developers towards best practices in AI development
and safety evaluations. Complementing this is an “Acceptable Use Policy,” which defines boundaries for
ensuring the responsible use of the model.
Engaging the global community
Meta recognizes the collective intelligence of the global community. Introducing initiatives such as the
Open Innovation AI Research Community invites academic researchers to share insights and research
on the responsible development of LLMs. Furthermore, the Llama Impact Challenge is a call to action for
public, non-profit, and for-profit entities to harness Llama 2 in addressing critical global challenges like
environmental conservation and education.
Launch your project with LeewayHertz
We specialize in fine-tuning pre-trained LLMs to ensure they offer domain-specific responses tailored to
your unique business requirements. For the specifics you’re looking for, contact us today!
Learn More
Why does Llama 2 matter in the AI landscape?
The global AI community has long awaited a shift from commercial monopolization towards open-source
research and experimentation. Meta’s Llama 2 heralds this change. By offering an open-source AI, Meta
ensures a credible alternative to closed-source AI. It democratizes AI, allowing other companies to
develop AI-powered applications under their control, bypassing the commercial constraints of tech giants
like Apple, Google, and Amazon.
Llama 2 is not just a technological marvel; it’s a statement on the importance of responsibility,
transparency, and collaboration in AI. It embodies a future where AI development prioritizes societal
benefits, open dialogue, and ethical considerations.
How does Llama 2 work?
Llama 2, a state-of-the-art language model, has been built using sophisticated training techniques to
understand and generate human-like text. To comprehend its operations, one must delve into its data
sources, training methodologies, and potential applications.
Data sources and neural network training
Llama 2’s foundational strength is attributed to its extensive training on a staggering 2 trillion tokens.
These tokens were sourced from publicly accessible repositories, including:
Common crawl: An expansive archive encompassing billions of web pages.
Wikipedia: The free encyclopedia offering a wealth of knowledge on myriad topics.
Project gutenberg: A treasure trove of public domain books.
5/27
Each token, be it a word or a semantic fragment, empowers Llama 2 to discern the meaning behind the
text. For instance, if the model consistently encounters “Apple” and “iPhone” together, it infers the
inherent relationship between these terms, distinguishing it from other related terms such as “apple” and
“fruit.”
Ensuring quality and mitigating bias
Given the vastness and diversity of the internet, training a model solely on such data can inadvertently
introduce biases or produce inappropriate content. Acknowledging this, the developers of Llama 2
incorporated additional training mechanisms:
Reinforcement Learning with Human Feedback (RLHF): This technique involves human testers
who evaluate multiple AI-generated responses. Their feedback is instrumental in guiding the model
towards generating more relevant and appropriate content.
Adaptation for conversational context
Llama 2’s chat versions were meticulously fine-tuned using specific data sets to enhance conversational
prowess. This ensures that when engaged in a dialogue, Llama 2 responds naturally, simulating human
interaction.
Customization and fine-tuning
One of Llama 2’s defining features is its adaptability. Organizations can mold it to resonate with their
unique brand voice. For instance, if a firm wishes to produce summaries reflecting its distinct style, Llama
2 can be trained on numerous examples to achieve this. Similarly, the model can be fine-tuned for
customer support optimization using FAQs and chat logs, allowing it to respond precisely to user queries.
Llama 2’s robustness and adaptability are products of its comprehensive training and fine-tuning
methodologies. Its ability to assimilate vast data, combined with human feedback mechanisms and
customization options, positions it at the forefront of the language model domain.
A thorough analysis of Llama 2 in comparison to other leading LLMs
The advancement of AI, especially in the domain of large language models, has been nothing short of
extraordinary. This is prominently demonstrated by Llama 2, an LLM designed with adaptability in mind to
empower developers and researchers to explore new horizons and create innovative applications. Here,
we explore the outcomes of some experiments carried out to evaluate how Llama 2 compares to giants
like OpenAI’s GPT and Google’s PaLM.
Creative aptitude: Llama 2 was prompted to simulate a sarcasm-laden dialogue on space
exploration; the resultant discourse, although impressive, was trailing slightly behind ChatGPT.
When compared with Google’s Bard, Llama 2 showcased a superior flair. Thus, while ChatGPT
remains the frontrunner in creative engagements, Llama 2 holds a commendable position amongst
its peers.
6/27
Programming capabilities: Llama 2 was pitted against ChatGPT and Bard in a coding challenge.
The task? To develop functional applications ranging from a basic to-do list to a Tetris game.
Although ChatGPT mastered each challenge, Llama 2, akin to Bard, efficiently crafted the to-do list
and an authentication system, stumbling only on the Tetris game.
Mathematical proficiency: Llama 2’s prowess in solving algebraic and logical math problems was
noteworthy, particularly when compared to Bard. However, ChatGPT’s mathematical proficiency
remained unmatched. Remarkably, Llama 2 excelled in certain problems where its predecessors, in
their early stages, had faltered.
Reasoning and commonsense: A facet that remains a challenge for many AI models is
commonsense reasoning. ChatGPT unsurprisingly led the pack. The contest for the second spot
was neck-to-neck between Bard and Llama 2, with Bard slightly edging out.
Llama 2, though an impressive foundational model, still has room for growth compared to certain other
specialized, fine-tuned models on the market. Foundational models like Llama 2 are designed with
versatility and future adaptability at their core, unlike fine-tuned models optimized for domain-specific
expertise. Given its nascent stage and its ‘foundational’ nature, the potential avenues for Llama 2’s
evolution are promising.
What does fine-tuning an LLM mean?
When discussing the fine-tuning of LLMs, it’s crucial to recognize that such practices extend beyond
language models. Fine-tuning can be applied across various machine learning models based on different
use cases.
7/27
Machine learning models are trained to identify patterns within given datasets. For instance, a
Convolutional Neural Network (CNN) designed to detect cars in urban areas would be highly proficient in
that domain due to training on relevant images. Yet, when faced with detecting trucks on highways, its
efficacy might decrease due to unfamiliarity with that data distribution. Rather than starting from scratch
with a new training dataset, fine-tuning allows for adjustments to be made to the model to accommodate
new data types.
Several advanced LLMs are available, including GPT-3, Bloom, BERT, T5, and XLNet. GPT-3, for
instance, is a premium model recognized for its vast training on 175 billion parameters, making it adept
for various natural language processing tasks. BERT, conversely, is a more accessible open-source
model excelling in understanding contextual word relationships. The choice between models like GPT-3
and BERT largely depends on the specific task at hand, be it text generation or text classification.
Techniques for LLM fine-tuning
The process of fine-tuning LLMs is intricate, with varying techniques ideal for specific applications.
Sometimes, the goal is to train a model to suit a novel task.
Imagine having a pre-trained LLM skilled in text generation, but you want it to perform sentiment analysis.
This will entail remodeling the model with subtle architectural tweaks before diving into the fine-tuning
phase.
In such a context, you will primarily harness the numeric vectors called embeddings generated by the
LLM’s transformer component. These embeddings carry detailed features of the given input.
Certain LLMs directly produce these embeddings, whereas others, such as the GPT series, use these
embeddings for token or text generation. During adaptation, the LLM’s embedding layer gets linked to a
classification system, typically a set of fully connected layers translating embeddings into class
probabilities. The emphasis here lies in training the classification segment using model-driven
embeddings.
While the LLM’s attention layers generally remain unchanged—offering computational efficiency—the
classifier requires a supervised learning dataset with text instances and their respective classifications.
The magnitude of your fine-tuning data relies on task intricacy and classifier specifics. Yet, occasions
demand a deeper adjustment, requiring unlocking attention layers for a full-blown fine-tuning project.
It’s worth noting that this intensive process is also dependent on the model size. Besides, there exist
strategies to streamline costs related to fine-tuning. Let’s delve deeper and explore some prominent fine-
tuning techniques.
8/27
Unsupervised versus supervised fine-tuning (SFT)
Sometimes, there’s a need to refresh the LLM’s knowledge base without necessarily changing its
behavior. If, for instance, you intend to adapt the model to medical terminologies or a novel
language, harnessing an expansive, unstructured dataset suffices. You can choose between
unsupervised pretraining with ample unstructured data or supervised fine-tuning with labeled
datasets for a specific task.Here, the goal is to immerse the model in a sea of tokens representative
of the new domain or anticipated input types. Leveraging vast unstructured datasets is scalable,
thanks to unsupervised or self-supervised methodologies.However, there are cases where merely
updating the model’s information reservoir falls short. An LLM’s behavior needs an overhaul,
necessitating a supervised fine-tuning (SFT) dataset, complete with prompts and expected
outcomes. This method is pivotal for models like ChatGPT, which are designed to be highly
responsive to user directives.
9/27
Reinforcement Learning from Human Feedback (RLHF)
In elevating SFT, some practitioners employ reinforcement learning from human feedback, which is
a complex procedure. Currently, only well-resourced organizations have the capacity to employ
RLHF. While RLHF techniques vary, they all emphasize human-guided LLM training. Human
reviewers assess the model’s outputs for certain prompts, guiding the model toward desired
results.Take ChatGPT by OpenAI as a RLHF benchmark. Human feedback aids in developing a
reward model mirroring human preferences. The LLM then undergoes rigorous reinforcement
learning to optimize its outcomes based on these reward pointers.
10/27
Parameter-efficient Fine-tuning (PEFT)
PEFT, an emerging field within LLM fine-tuning, tries to minimize the resources spent on updating
model parameters. PEFT techniques focus on limiting parameter alterations.One such method
gaining traction is the Low-rank Adaptation (LoRA). The essence of LoRA is that only certain
parameters need adjustments for downstream tasks. Thus, a compact matrix can capture task-
specific nuances.Implementing LoRA implies training this compact matrix rather than the entire
LLM’s parameters. Once trained, the LoRA model weights can either merge with the primary LLM or
be used during inference.Adopting techniques like LoRA can reduce fine-tuning expenditures
considerably while enabling the storage of numerous fine-tuned models ready for integration during
LLM operations.
Reinforcement Learning from AI Feedback (RLAIF)
Fine-tuning a Large Language Model (LLM) using Reinforcement Learning from AI Feedback (RLAIF)
involves a structured process that ensures the model’s behavior aligns with a set of predefined principles
or guidelines, often encapsulated in a Constitution. Here’s an overview of the steps involved in fine-tuning
an LLM using RLAIF:
Define the Constitution
Constitution creation: Begin by defining the Constitution, a document or set of guidelines that
outlines the principles, ethics, and behavioral norms that the AI model should adhere to. This
Constitution will guide the AI Feedback Model in generating preferences.
11/27
Set up the AI feedback model
Model selection: Choose or develop an AI feedback model capable of understanding and applying
the principles outlined in the Constitution.
Model training (if necessary): If the AI feedback model isn’t pre-trained, you might need to train it
to interpret the Constitution and evaluate responses based on it. This could involve supervised
learning, using a dataset where responses are annotated based on their alignment with
constitutional principles.
Generate feedback data
Feedback generation: Use the AI feedback model to evaluate pairs of prompt/response instances.
For each pair, the model assigns a preference score, indicating which response aligns better with
the principles in the Constitution.
Train the Preference Model (PM)
Data preparation: Organize the AI-generated feedback into a dataset suitable for training the
Preference Model (PM).
Preference model training: Train the model on this dataset. It learns to predict the preferred
response to a given prompt based on the feedback scores provided by the AI feedback model.
Fine-tune the LLM
Integration with reinforcement learning: Integrate the trained preference model into a
reinforcement learning framework. In this setup, the preference model provides the reward signal
based on how well a response from the LLM aligns with the constitutional principles.
LLM fine-tuning: Fine-tune the LLM using this reinforcement learning setup. The LLM generates
responses to prompts, and the responses are evaluated by the PM. The LLM then adjusts its
parameters to maximize the reward signal, effectively learning to produce responses that better
align with the constitutional principles.
Evaluation and iteration
Model evaluation: After fine-tuning, evaluate the LLM’s performance to ensure it aligns with the
desired principles and effectively handles a variety of prompts.
Feedback loop: If the performance is not satisfactory or if there’s room for improvement, you might
need to iterate over the process. This could involve refining the Constitution, adjusting the AI
feedback model, retraining the preference model, or further fine-tuning the LLM.
Deployment and monitoring
Deployment: Once the fine-tuning process meets the performance and ethical standards, deploy
the model.
Continuous monitoring: Regularly monitor the model’s performance and behavior to ensure it
continues to align with the constitutional principles, adapting to new data and evolving
requirements.
12/27
Fine-tuning an LLM using RLAIF is a complex process that involves careful design, consistent evaluation,
and ongoing adjustment to ensure that the model’s behavior aligns with human values and ethical
standards. It’s a dynamic process that benefits from continuous monitoring and iterative improvement.
Launch your project with LeewayHertz
We specialize in fine-tuning pre-trained LLMs to ensure they offer domain-specific responses tailored to
your unique business requirements. For the specifics you’re looking for, contact us today!
Learn More
How can we perform fine-tuning on Llama 2?
PEFT approaches – LoRA and QLoRA
Parameter-efficient Fine-tuning (PEFT) presents an effective approach to fine-tuning LLMs. Distinct from
traditional methods that mandate extensive parameter updates, PEFT focuses on refining a select subset
of parameters, minimizing computational demands and expediting the training process. By gauging the
significance of individual parameters based on their influence on the overall model, PEFT prioritizes those
with maximal impact. Consequently, only these pivotal parameters undergo adjustments during the fine-
tuning phase, while others remain static. Such a strategy curtails computational and temporal overheads
and paves the way for swift model iteration and deployment. As PEFT emerges as a frontrunner in
optimization techniques, it’s vital to recognize that it remains a dynamic field, with continuous research
ushering in nuanced variations and enhancements. The choice of PEFT application will invariably depend
on specific research goals and practical contexts.
PEFT is an innovative approach that effectively reduces RAM and storage demands. It achieves this by
primarily refining a select set of parameters while maintaining the majority in their original state. PEFT’s
strength lies in its ability to foster robust generalization even when datasets are of limited volume.
Moreover, it augments the model’s reusability and transferability. Small model checkpoints, derived from
PEFT, seamlessly integrate with the foundational model, promoting versatile fine-tuning across diverse
scenarios by incorporating PEFT-specific parameters. A salient feature is the preservation of insights
from the pre-training phase, ensuring the model remains resilient to extensive memory loss or
catastrophic forgetting.
Prominent PEFT strategies emphasize the integrity of the pre-trained base, introducing supplementary
layers or parameters termed “Adapters.” Through a process dubbed “adapter-tuning,” these layers are
integrated with the foundational model, with tuning efforts concentrated on the novel layers alone. A
notable challenge with this model is the heightened latency during the inference stage, potentially
hampering efficiency in various contexts.
Parameter-efficient fine-tuning has become a pivotal area of focus within AI, and there are myriad
techniques to achieve this. Among these, the Low-rank Adaptation (LoRA) and its enhanced counterpart,
QLoRA, are distinguished for their effectiveness.
Low-rank Adaptation (LoRA)
13/27
LoRA introduces an innovative paradigm in model fine-tuning, offering a modular method adept at
domain-specific tasks and transferring learning capabilities. The intrinsic beauty of LoRA lies in its ability
to be executed using minimal resources while being memory-conservative.
A closer examination of the LoRA technique reveals the following steps and intricacies:
Pre-trained parameter preservation: The original neural network’s foundational parameters (W)
remain unaltered during the adaptation process.
Inclusion of new parameters: Accompanying this original setup, supplementary networks
(denoted as WA and WB) are embedded. These networks champion the use of low-rank vectors.
The dimensionality of these vectors (dxr and rxd) is purposefully diminished compared to the
original network’s dimensions. Here, ‘d’ symbolizes the original vector’s dimension, and ‘r’ denotes
the low rank. Notably, a smaller ‘r’ accelerates training, although it may require a fine balance to
maintain optimal performance.
Dot product calculation: Both the original and low-rank networks are intertwined through a dot
product, generating an ‘n’-dimensional weight matrix that informs the model’s results.
Loss function computation: The loss function is discerned by contrasting the derived results
against expected outputs. Traditional backpropagation methods are then harnessed to calibrate the
WA and WB weights.
The LoRA’s essence is its economical memory footprint and infrastructure demands. For instance, given
a 512×512 parameter matrix in a typical feed-forward network (equivalent to 262,144 parameters), by
leveraging a LoRA adapter with a rank of 2, only 2,048 parameters (512×2 for both WA and WB) undergo
domain-specific data training. This streamlined process significantly elevates computational efficiency.
14/27
An exceptional facet of LoRA is its modular design. The trained adapter can be retained as an
independent entity, serving as a modular component for specific domains. Furthermore, LoRA adeptly
bypasses potential catastrophic memory loss by abstaining from modifying the foundational weights.
Further developments: QLoRA
To further accentuate the effectiveness of LoRA, QLoRA has been introduced as an augmented
technique, promising enhanced optimization and performance. This advanced method builds upon the
foundational principles of LoRA, optimizing it for even more intricate tasks.
QLoRA builds upon LoRA to further optimize efficiency by converting the weight values of the original
network from high-definition formats, like Float32, to more compact types, such as int4. This conversion
reduces memory usage and accelerates computational speeds.
QLoRA introduces three primary enhancements over LoRA, establishing it as a leading method in PEFT.
1. 4-bit NF4 quantization
Using 4-bit NormalFloat4 is a strategic move to decrease the storage requirements. This process is
divided into three phases:
Normalization & quantization: Here, weights are shifted to a neutral mean and a consistent unit
variance. Given that a 4-bit data format can hold just 16 distinct values, weights are aligned with the
closest among these 16 based on their relative position. For example, if there’s an FP32 weight of
value 0.2121, its nearest 4-bit equivalent would be stored, not the exact value.
Dequantization: This is the reverse process. Post-training, the original weights, which had been
adjusted, are restored to their near-original form.
Double quantization: This phase enhances memory optimization further. Grouping quantization
values and applying an 8-bit quantization can result in a significant reduction in memory usage. In
essence, for a model with 1 million parameters, the memory demand can be slashed to around
125,000 bits.
2. Unified memory paging
15/27
Together with the quantization methods, QLoRA leverages nVidia’s unified memory capabilities. This
feature facilitates smooth transfers between GPU and CPU memory. This is particularly useful during
memory-intensive operations or unexpected GPU demand spikes, ensuring no memory overflow.
While both LoRA and QLoRA are at the forefront of PEFT, QLoRA’s advanced techniques offer superior
efficiency and optimization.
Fine-tuning the Llama 2 model with QLoRA
Let’s delve into the process of fine-tuning the Llama 2 model, which features a massive 7 billion
parameters. We will harness the computational power of a T4 GPU, backed by high RAM, available on
Google Colab at a rate of 2.21 credits per hour. It’s worth noting that the T4 comes equipped with 16 GB
of VRAM. Now, when you consider the weight of Llama 2-7b (7 billion parameters equating to 14 GB in
FP16 format), the VRAM is stretched almost to its limit. This scenario doesn’t even factor in additional
overheads such as optimizer states, gradients, and forward activations. The implication is clear:
traditional fine-tuning won’t work here. We need to apply parameter-efficient fine-tuning techniques, such
as LoRA or QLoRA.
One way to significantly cut down on VRAM usage is by fine-tuning the model using 4-bit precision. This
makes QLoRA an apt choice. Fortunately, the Hugging Face ecosystem is equipped with libraries like
transformers, accelerate, peft, trl, and bitsandbytes to facilitate this. Our step-by-step code is inspired by
the contributions of Younes Belkada on GitHub. We initiate the process by installing and activating these
libraries.
!pip install -q accelerate==0.21.0 peft==0.4.0 bitsandbytes==0.40.2 transformers==4.31.0 trl==0.4.7
import os
import torch
from datasets import load_dataset
from transformers import (
AutoModelForCausalLM,
AutoTokenizer,
BitsAndBytesConfig,
HfArgumentParser,
TrainingArguments,
pipeline,
logging,
16/27
)
from peft import LoraConfig, PeftModel
from trl import SFTTrainer
Let’s delve into the adjustable parameters in this context. We will begin by loading the llama-2-7b-chat-hf
model, commonly referred to as the chat model. Our aim is to train this model using the dataset
mlabonne/guanaco-llama2-1k, which comprises 1,000 samples. Upon completion, the resulting fine-tuned
model will be termed llama-2-7b-miniguanaco. For those curious about the origin and creation of this
dataset, a detailed notebook is available for review. However, do note that customization is possible. The
Hugging Face Hub boasts a plethora of valuable datasets, including the notable databricks/databricks-
dolly-15k.
In employing QLoRA, we will set the rank at 64, coupled with a scaling parameter of 16. Our approach
involves loading the Llama 2 model directly in 4-bit precision, specifically employing the NF4 type, and
then training it over a single epoch. For insights into other associated parameters, you are encouraged to
explore the TrainingArguments, PeftModel, and SFTTrainer documentation.
# The model that you want to train from the Hugging Face hub
model_name = "NousResearch/Llama-2-7b-chat-hf"
# The instruction dataset to use
dataset_name = "mlabonne/guanaco-llama2-1k"
# Fine-tuned model name
new_model = "llama-2-7b-miniguanaco"
################################################################################
# QLoRA parameters
################################################################################
# LoRA attention dimension
lora_r = 64
# Alpha parameter for LoRA scaling
lora_alpha = 16
# Dropout probability for LoRA layers
lora_dropout = 0.1
17/27
################################################################################
# bitsandbytes parameters
################################################################################
# Activate 4-bit precision base model loading
use_4bit = True
# Compute dtype for 4-bit base models
bnb_4bit_compute_dtype = "float16"
# Quantization type (fp4 or nf4)
bnb_4bit_quant_type = "nf4"
# Activate nested quantization for 4-bit base models (double quantization)
use_nested_quant = False
################################################################################
# TrainingArguments parameters
################################################################################
# Output directory where the model predictions and checkpoints will be stored
output_dir = "./results"
# Number of training epochs
num_train_epochs = 1
# Enable fp16/bf16 training (set bf16 to True with an A100)
fp16 = False
bf16 = False
# Batch size per GPU for training
per_device_train_batch_size = 4
# Batch size per GPU for evaluation
per_device_eval_batch_size = 4
# Number of update steps to accumulate the gradients for
18/27
gradient_accumulation_steps = 1
# Enable gradient checkpointing
gradient_checkpointing = True
# Maximum gradient normal (gradient clipping)
max_grad_norm = 0.3
# Initial learning rate (AdamW optimizer)
learning_rate = 2e-4
# Weight decay to apply to all layers except bias/LayerNorm weights
weight_decay = 0.001
# Optimizer to use
optim = "paged_adamw_32bit"
# Learning rate schedule (constant a bit better than cosine)
lr_scheduler_type = "constant"
# Number of training steps (overrides num_train_epochs)
max_steps = -1
# Ratio of steps for a linear warmup (from 0 to learning rate)
warmup_ratio = 0.03
# Group sequences into batches with same length
# Saves memory and speeds up training considerably
group_by_length = True
# Save checkpoint every X updates steps
save_steps = 25
# Log every X updates steps
logging_steps = 25
################################################################################
# SFT parameters
19/27
################################################################################
# Maximum sequence length to use
max_seq_length = None
# Pack multiple short examples in the same input sequence to increase efficiency
packing = False
# Load the entire model on the GPU 0
device_map = {"": 0}
Let’s commence the fine-tuning process, integrating various components for this task.
Launch your project with LeewayHertz
We specialize in fine-tuning pre-trained LLMs to ensure they offer domain-specific responses tailored to
your unique business requirements. For the specifics you’re looking for, contact us today!
Learn More
Initially, we will source the previously defined dataset. It’s pertinent to note that our dataset is already
refined; however, under typical circumstances, this step would entail reshaping prompts, filtering out
inconsistent text, amalgamating multiple datasets, and so forth.
Subsequently, we will set up bitsandbytes to facilitate 4-bit quantization.
Following this, we will instantiate the Llama 2 model in 4-bit precision on a GPU, aligning it with the
appropriate tokenizer.
To conclude our preparations, we will initialize the configurations for QLoRA, outline the standard training
parameters, and forward all these settings to the SFTTrainer. With everything in place, the training
journey begins!
# Load dataset (you can process it here)
dataset = load_dataset(dataset_name, split="train")
# Load tokenizer and model with QLoRA configuration
compute_dtype = getattr(torch, bnb_4bit_compute_dtype)
bnb_config = BitsAndBytesConfig(
load_in_4bit=use_4bit,
bnb_4bit_quant_type=bnb_4bit_quant_type,
20/27
bnb_4bit_compute_dtype=compute_dtype,
bnb_4bit_use_double_quant=use_nested_quant,
)
# Check GPU compatibility with bfloat16
if compute_dtype == torch.float16 and use_4bit:
major, _ = torch.cuda.get_device_capability()
if major >= 8:
print("=" * 80)
print("Your GPU supports bfloat16: accelerate training with bf16=True")
print("=" * 80)
# Load base model
model = AutoModelForCausalLM.from_pretrained(
model_name,
quantization_config=bnb_config,
device_map=device_map
)
model.config.use_cache = False
model.config.pretraining_tp = 1
# Load LLaMA tokenizer
tokenizer = AutoTokenizer.from_pretrained(model_name, trust_remote_code=True)
tokenizer.add_special_tokens({'pad_token': '[PAD]'})
tokenizer.pad_token = tokenizer.eos_token
tokenizer.padding_side = "right"
# Load LoRA configuration
peft_config = LoraConfig(
lora_alpha=lora_alpha,
21/27
lora_dropout=lora_dropout,
r=lora_r,
bias="none",
task_type="CAUSAL_LM",
)
# Set training parameters
training_arguments = TrainingArguments(
output_dir=output_dir,
num_train_epochs=num_train_epochs,
per_device_train_batch_size=per_device_train_batch_size,
gradient_accumulation_steps=gradient_accumulation_steps,
optim=optim,
save_steps=save_steps,
logging_steps=logging_steps,
learning_rate=learning_rate,
weight_decay=weight_decay,
fp16=fp16,
bf16=bf16,
max_grad_norm=max_grad_norm,
max_steps=max_steps,
warmup_ratio=warmup_ratio,
group_by_length=group_by_length,
lr_scheduler_type=lr_scheduler_type,
report_to="tensorboard"
)
# Set supervised fine-tuning parameters
22/27
trainer = SFTTrainer(
model=model,
train_dataset=dataset,
peft_config=peft_config,
dataset_text_field="text",
max_seq_length=max_seq_length,
tokenizer=tokenizer,
args=training_arguments,
packing=packing,
)
# Train model
trainer.train()
# Save trained model
trainer.model.save_pretrained(new_model)
The duration of the training process can vary significantly based on your dataset’s size. In this instance, it
was completed in under an hour using a T4 GPU. To review the progress visually, one can refer to the
plots available on Tensorboard.
23/27
%load_ext tensorboard
%tensorboard --logdir results/runs
Let’s confirm that the model is operating as expected. While a comprehensive evaluation would be ideal,
we can utilize the text generation pipeline for preliminary assessments by posing questions such as,
“What is a large language model?” It’s crucial to adjust the input to align with Llama 2’s prompt structure.
# Ignore warnings
logging.set_verbosity(logging.CRITICAL)
# Run text generation pipeline with our next model
prompt = "What is a large language model?"
pipe = pipeline(task="text-generation", model=model, tokenizer=tokenizer, max_length=200)
result = pipe(f"[INST] {prompt} [/INST]")
print(result[0]['generated_text'])
The model provides the subsequent answer:
/usr/local/lib/python3.10/dist-packages/transformers/generation/utils.py:1270: UserWarning: You have
modified the pretrained model configuration to control generation. This is a deprecated strategy to control
generation and will be removed soon, in a future version. Please use a generation configuration file (see
https://huggingface.co/docs/transformers/main_classes/text_generation )
warnings.warn(
/usr/local/lib/python3.10/dist-packages/torch/utils/checkpoint.py:31: UserWarning: None of the inputs
have requires_grad=True. Gradients will be None
warnings.warn("None of the inputs have requires_grad=True. Gradients will be None")
INST] What is a large language model? [/INST] A large language model is a type of artificial intelligence
(AI) model that is trained on a large dataset of text to generate human-like language outputs. It is
designed to be able to understand and generate text in a way that is similar to human language, and can
be used for a variety of applications such as chatbots, language translation, and text summarization.
Large language models are typically trained using deep learning techniques, such as recurrent neural
networks (RNNs) or transformer models, and are often based on pre-trained models such as BERT or
RoBERTa. These models are trained on large datasets of text, such as books, articles, or websites, and
are designed to learn the patterns and structures of language.
Some examples of large language models include:
* BERT (Bidirectional Encoder Representations from Transformers):
24/27
Drawing from our observations, the coherence demonstrated by a model encompassing merely 7 billion
parameters is quite impressive. Feel free to experiment further by posing more complex questions,
perhaps drawing from datasets like BigBench-Hard. Historically, the Guanaco dataset has been pivotal in
crafting top-tier models. To achieve this, consider training a Llama 2 model utilizing the
mlabonne/guanaco-llama2 dataset.
So, how do we save our refined llama-2-7b-miniguanaco model? The key lies in integrating the LoRA
weights with the foundational model. Presently, a direct, seamless method to achieve this eludes us. The
procedure involves reloading the base model in FP16 precision and harnessing the capabilities of the peft
library for amalgamation. Regrettably, this approach has occasionally been met with VRAM-related
challenges, even after its clearance. It might be beneficial to restart the notebook, initiate the primary
three cells, and then progress to the subsequent one.
# Reload model in FP16 and merge it with LoRA weights
base_model = AutoModelForCausalLM.from_pretrained(
model_name,
low_cpu_mem_usage=True,
return_dict=True,
torch_dtype=torch.float16,
device_map=device_map,
)
model = PeftModel.from_pretrained(base_model, new_model)
model = model.merge_and_unload()
# Reload tokenizer to save it
tokenizer = AutoTokenizer.from_pretrained(model_name, trust_remote_code=True)
tokenizer.add_special_tokens({'pad_token': '[PAD]'})
tokenizer.pad_token = tokenizer.eos_token
tokenizer.padding_side = "right"
Having successfully combined our weights and reinstated the tokenizer, we are positioned to upload the
entirety to the Hugging Face Hub, ensuring our model’s preservation.
!huggingface-cli login
model.push_to_hub(new_model, use_temp_dir=False)
25/27
tokenizer.push_to_hub(new_model, use_temp_dir=False)
This model is now ready for inference and can be accessed and loaded from the Hub just as you would
with any other Llama 2 model.
Challenges in fine-tuning Llama 2
Navigating the fine-tuning process
Fine-tuning LLMs like Llama 2 presents its unique set of complexities, differing from standard text-to-text
model adaptations. The process remains intricate for enterprise applications even with supportive
libraries like HuggingFace’s transformers and trl. Key challenges include:
Absence of a standard interface to set prompt and task descriptors and to adjust datasets in
alignment with these parameters.
The multitude of training parameters that necessitate manual configuration tailored to specific
datasets.
The onus is establishing, managing, and scaling a robust infrastructure for fine-tuning distributed
models. Achieving optimal performance with a model with around 7B parameters becomes
challenging, especially when considering GPU memory constraints. Understanding and deploying
distributed training effectively mandates a deep-rooted understanding of the subject.
Securing computational assets
LLMs, by nature, are voracious consumers of computational resources. Their memory, power, and time
demands are lofty, constraining entities lacking these resources. This disparity can act as a barrier to
universalizing the fine-tuning process.
Streamlining distributed model training
The sheer size of LLMs like Llama 2 makes it impractical to house them on a singular GPU, barring a few
like the A100s. This necessitates a shift from standard parallel training to either model parallel or pipeline
parallel training, whereby model weights are disseminated across multiple GPU instances. Open-source
tools such as Deepspeed facilitate this, but mastering its vast array of configurable parameters can be
daunting. Incorrect configurations can lead to memory overflow on CPUs/GPUs or suboptimal GPU
usage due to unwarranted offloading, elevating training costs.
How does LeewayHertz help in building Llama 2 model-powered solutions?
LeewayHertz, a seasoned AI development company, offers expert solutions in fine-tuning the Llama 2
model to build custom solutions aligned with specific organizational needs and objectives. Here is how we
can help you:
Strategic consulting
26/27
Our consulting process begins by deeply understanding your organization’s goals, challenges, and
competitive landscape. We then recommend the most appropriate Llama 2 model-powered solution
tailored to your specific needs. Finally, we develop a comprehensive implementation strategy, ensuring
the solution aligns perfectly with your objectives and positions your organization for success in the rapidly
evolving tech landscape.
Data engineering for Llama 2
With precise data engineering, we transform your organization’s valuable data into a powerful asset for
the development of highly effective Llama 2 model-powered solutions. Our skilled developers carefully
prepare your proprietary data, making sure it meets the necessary standards for fine-tuning the Llama 2
model, thus optimizing its performance to the fullest potential.
Fine-tuning expertise in Llama 2
We fine-tune the Llama 2 model with your proprietary data for domain-specific performance and build a
customized solution around it. This approach ensures the solution delivers accurate and meaningful
responses within your unique context.
Custom Llama 2 solutions
We ensure innovation, efficiency, and a competitive edge with our expertly developed Llama 2 model-
powered solutions. Whether you need chatbots for personalized customer interactions, intelligent content
generators, or context-aware recommendation systems, our Llama 2 model-powered applications are
meticulously crafted to enhance your organization’s capabilities in the dynamic AI landscape.
Seamless integration of Llama 2
We ensure that the Llama 2 model-powered solutions we develop seamlessly align with your existing
processes. Our approach involves analyzing your workflows, identifying key integration points, and
developing a customized integration strategy. This minimizes disruptions while maximizing the benefits of
our solutions, facilitating a smooth transition for your organization into a more efficient, AI-enhanced
operational environment.
Continuous evolution: Upgrades and maintenance
We ensure to keep your Llama 2 model-powered application up-to-date and performance-optimized with
our comprehensive upgrade and maintenance services. We diligently monitor emerging trends, security
updates, and advancements in AI technology, ensuring your application stays competitive and secure in
the rapidly evolving tech landscape.
Endnote
This article discusses the intricacies of fine-tuning the Llama 2 7b model leveraging a Colab notebook.
We laid the foundational understanding of LLM training and the intricacies of fine-tuning, shedding light
on the significance of instruction datasets. We effectively adapted the Llama 2 model in our practical
section, ensuring compatibility with its intrinsic prompt templates and tailored parameters.
27/27
When incorporated into platforms like LangChain, these refined models emerge as potent alternatives to
offerings like the OpenAI API. It’s imperative to recognize that instruction datasets stand paramount in the
evolving landscape of language models. The efficacy of your model is intrinsically tied to the quality of its
training data. As you embark on this journey, prioritizing high-caliber datasets becomes crucial.
Navigating the complexities of models like Llama 2 may appear challenging, but the rewards are
substantial with diligent application and a clear roadmap. Harnessing the prowess of these advanced
LLMs for targeted tasks can enhance applications, ushering in a new era of linguistic computing.
Don’t let pre-trained models limit your vision. Our extensive development experience and LLM fine-tuning
expertise enable us to build robust custom LLMs tailored to businesses’ specific needs. Contact our AI
experts today and harness the limitless power of LLMs!

More Related Content

What's hot

Recurrent Neural Networks, LSTM and GRU
Recurrent Neural Networks, LSTM and GRURecurrent Neural Networks, LSTM and GRU
Recurrent Neural Networks, LSTM and GRUananth
 
The importance of model fairness and interpretability in AI systems
The importance of model fairness and interpretability in AI systemsThe importance of model fairness and interpretability in AI systems
The importance of model fairness and interpretability in AI systemsFrancesca Lazzeri, PhD
 
Conférence Sécurité et Intelligence Artificielle - INHESJ 2018
Conférence Sécurité et Intelligence Artificielle - INHESJ 2018Conférence Sécurité et Intelligence Artificielle - INHESJ 2018
Conférence Sécurité et Intelligence Artificielle - INHESJ 2018OPcyberland
 
Challenges in Large Scale Machine Learning
Challenges in Large Scale  Machine LearningChallenges in Large Scale  Machine Learning
Challenges in Large Scale Machine LearningSudarsun Santhiappan
 
Explainable AI in Industry (KDD 2019 Tutorial)
Explainable AI in Industry (KDD 2019 Tutorial)Explainable AI in Industry (KDD 2019 Tutorial)
Explainable AI in Industry (KDD 2019 Tutorial)Krishnaram Kenthapadi
 
Interpretable Machine Learning Using LIME Framework - Kasia Kulma (PhD), Data...
Interpretable Machine Learning Using LIME Framework - Kasia Kulma (PhD), Data...Interpretable Machine Learning Using LIME Framework - Kasia Kulma (PhD), Data...
Interpretable Machine Learning Using LIME Framework - Kasia Kulma (PhD), Data...Sri Ambati
 
Quelques points sur les métaheuristiques
Quelques points sur les métaheuristiquesQuelques points sur les métaheuristiques
Quelques points sur les métaheuristiquesBENSMAINE Abderrahmane
 
Master's Thesis Presentation
Master's Thesis PresentationMaster's Thesis Presentation
Master's Thesis PresentationWajdi Khattel
 
Adversarial Attacks and Defense
Adversarial Attacks and DefenseAdversarial Attacks and Defense
Adversarial Attacks and DefenseKishor Datta Gupta
 
Transfer Learning for Natural Language Processing
Transfer Learning for Natural Language ProcessingTransfer Learning for Natural Language Processing
Transfer Learning for Natural Language ProcessingSebastian Ruder
 
Deep Learning, Keras, and TensorFlow
Deep Learning, Keras, and TensorFlowDeep Learning, Keras, and TensorFlow
Deep Learning, Keras, and TensorFlowOswald Campesato
 
AI and Cybersecurity - Food for Thought
AI and Cybersecurity - Food for ThoughtAI and Cybersecurity - Food for Thought
AI and Cybersecurity - Food for ThoughtNUS-ISS
 
A survey on transfer learning
A survey on transfer learningA survey on transfer learning
A survey on transfer learningazuring
 
Deep learning crash course
Deep learning crash courseDeep learning crash course
Deep learning crash courseVishwas N
 
Fairness-aware Machine Learning: Practical Challenges and Lessons Learned (WS...
Fairness-aware Machine Learning: Practical Challenges and Lessons Learned (WS...Fairness-aware Machine Learning: Practical Challenges and Lessons Learned (WS...
Fairness-aware Machine Learning: Practical Challenges and Lessons Learned (WS...Krishnaram Kenthapadi
 

What's hot (20)

Recurrent Neural Networks, LSTM and GRU
Recurrent Neural Networks, LSTM and GRURecurrent Neural Networks, LSTM and GRU
Recurrent Neural Networks, LSTM and GRU
 
The importance of model fairness and interpretability in AI systems
The importance of model fairness and interpretability in AI systemsThe importance of model fairness and interpretability in AI systems
The importance of model fairness and interpretability in AI systems
 
Conférence Sécurité et Intelligence Artificielle - INHESJ 2018
Conférence Sécurité et Intelligence Artificielle - INHESJ 2018Conférence Sécurité et Intelligence Artificielle - INHESJ 2018
Conférence Sécurité et Intelligence Artificielle - INHESJ 2018
 
Challenges in Large Scale Machine Learning
Challenges in Large Scale  Machine LearningChallenges in Large Scale  Machine Learning
Challenges in Large Scale Machine Learning
 
RSA Game using an Oracle
RSA Game using an OracleRSA Game using an Oracle
RSA Game using an Oracle
 
Explainable AI in Industry (KDD 2019 Tutorial)
Explainable AI in Industry (KDD 2019 Tutorial)Explainable AI in Industry (KDD 2019 Tutorial)
Explainable AI in Industry (KDD 2019 Tutorial)
 
Interpretable Machine Learning Using LIME Framework - Kasia Kulma (PhD), Data...
Interpretable Machine Learning Using LIME Framework - Kasia Kulma (PhD), Data...Interpretable Machine Learning Using LIME Framework - Kasia Kulma (PhD), Data...
Interpretable Machine Learning Using LIME Framework - Kasia Kulma (PhD), Data...
 
Quelques points sur les métaheuristiques
Quelques points sur les métaheuristiquesQuelques points sur les métaheuristiques
Quelques points sur les métaheuristiques
 
Master's Thesis Presentation
Master's Thesis PresentationMaster's Thesis Presentation
Master's Thesis Presentation
 
Adversarial Attacks and Defense
Adversarial Attacks and DefenseAdversarial Attacks and Defense
Adversarial Attacks and Defense
 
Meta learning tutorial
Meta learning tutorialMeta learning tutorial
Meta learning tutorial
 
PhD Defense
PhD DefensePhD Defense
PhD Defense
 
Transfer Learning for Natural Language Processing
Transfer Learning for Natural Language ProcessingTransfer Learning for Natural Language Processing
Transfer Learning for Natural Language Processing
 
Rnn and lstm
Rnn and lstmRnn and lstm
Rnn and lstm
 
Explainable AI
Explainable AIExplainable AI
Explainable AI
 
Deep Learning, Keras, and TensorFlow
Deep Learning, Keras, and TensorFlowDeep Learning, Keras, and TensorFlow
Deep Learning, Keras, and TensorFlow
 
AI and Cybersecurity - Food for Thought
AI and Cybersecurity - Food for ThoughtAI and Cybersecurity - Food for Thought
AI and Cybersecurity - Food for Thought
 
A survey on transfer learning
A survey on transfer learningA survey on transfer learning
A survey on transfer learning
 
Deep learning crash course
Deep learning crash courseDeep learning crash course
Deep learning crash course
 
Fairness-aware Machine Learning: Practical Challenges and Lessons Learned (WS...
Fairness-aware Machine Learning: Practical Challenges and Lessons Learned (WS...Fairness-aware Machine Learning: Practical Challenges and Lessons Learned (WS...
Fairness-aware Machine Learning: Practical Challenges and Lessons Learned (WS...
 

Similar to FINE-TUNING LLAMA 2: DOMAIN ADAPTATION OF A PRE-TRAINED MODEL

Open Source vs Closed Source LLMs. Pros and Cons
Open Source vs Closed Source LLMs. Pros and ConsOpen Source vs Closed Source LLMs. Pros and Cons
Open Source vs Closed Source LLMs. Pros and ConsSprings
 
Train foundation model for domain-specific language model
Train foundation model for domain-specific language modelTrain foundation model for domain-specific language model
Train foundation model for domain-specific language modelBenjaminlapid1
 
Small Language Models Explained A Beginners Guide.pdf
Small Language Models Explained A Beginners Guide.pdfSmall Language Models Explained A Beginners Guide.pdf
Small Language Models Explained A Beginners Guide.pdfChristopherTHyatt
 
Northbay_December_2023_LLM_Reporting.pdf
Northbay_December_2023_LLM_Reporting.pdfNorthbay_December_2023_LLM_Reporting.pdf
Northbay_December_2023_LLM_Reporting.pdfssusera5352a2
 
Unlocking the Power of Generative AI An Executive's Guide.pdf
Unlocking the Power of Generative AI An Executive's Guide.pdfUnlocking the Power of Generative AI An Executive's Guide.pdf
Unlocking the Power of Generative AI An Executive's Guide.pdfPremNaraindas1
 
MLSEV Virtual. ML Platformization and AutoML in the Enterprise
MLSEV Virtual. ML Platformization and AutoML in the EnterpriseMLSEV Virtual. ML Platformization and AutoML in the Enterprise
MLSEV Virtual. ML Platformization and AutoML in the EnterpriseBigML, Inc
 
Learning Management System Reporting & Analytics Launch
Learning Management System Reporting & Analytics LaunchLearning Management System Reporting & Analytics Launch
Learning Management System Reporting & Analytics Launchlearnsystem3
 
Machine Learning Platformization & AutoML: Adopting ML at Scale in the Enterp...
Machine Learning Platformization & AutoML: Adopting ML at Scale in the Enterp...Machine Learning Platformization & AutoML: Adopting ML at Scale in the Enterp...
Machine Learning Platformization & AutoML: Adopting ML at Scale in the Enterp...Ed Fernandez
 
G11.2013 Application Development Life Cycle Management
G11.2013   Application Development Life Cycle ManagementG11.2013   Application Development Life Cycle Management
G11.2013 Application Development Life Cycle ManagementSatya Harish
 
Title_ From Concept to Launch_ ML-driven Software Product Development by Our ...
Title_ From Concept to Launch_ ML-driven Software Product Development by Our ...Title_ From Concept to Launch_ ML-driven Software Product Development by Our ...
Title_ From Concept to Launch_ ML-driven Software Product Development by Our ...Sophia Millerr
 
PMML - Predictive Model Markup Language
PMML - Predictive Model Markup LanguagePMML - Predictive Model Markup Language
PMML - Predictive Model Markup Languageaguazzel
 
CIO Applications Magazine Names Bardess One of the Top 25 ML Solution Providers
CIO Applications Magazine Names Bardess One of the Top 25 ML Solution ProvidersCIO Applications Magazine Names Bardess One of the Top 25 ML Solution Providers
CIO Applications Magazine Names Bardess One of the Top 25 ML Solution Providerschrishems1
 
The Cloud Is The Corporation Abeyta
The Cloud Is The Corporation AbeytaThe Cloud Is The Corporation Abeyta
The Cloud Is The Corporation Abeytabern co
 
Decision Transformers Model.pdf
Decision Transformers Model.pdfDecision Transformers Model.pdf
Decision Transformers Model.pdfJamieDornan2
 
Decision Transformers Model.pdf
Decision Transformers Model.pdfDecision Transformers Model.pdf
Decision Transformers Model.pdfJamieDornan2
 
Emerging engineering issues for building large scale AI systems By Srinivas P...
Emerging engineering issues for building large scale AI systems By Srinivas P...Emerging engineering issues for building large scale AI systems By Srinivas P...
Emerging engineering issues for building large scale AI systems By Srinivas P...Analytics India Magazine
 

Similar to FINE-TUNING LLAMA 2: DOMAIN ADAPTATION OF A PRE-TRAINED MODEL (20)

Open Source vs Closed Source LLMs. Pros and Cons
Open Source vs Closed Source LLMs. Pros and ConsOpen Source vs Closed Source LLMs. Pros and Cons
Open Source vs Closed Source LLMs. Pros and Cons
 
Train foundation model for domain-specific language model
Train foundation model for domain-specific language modelTrain foundation model for domain-specific language model
Train foundation model for domain-specific language model
 
Do you speak PackML
Do you speak PackMLDo you speak PackML
Do you speak PackML
 
ProIndústria2018 - Sala Alfa - A07
ProIndústria2018 - Sala Alfa - A07ProIndústria2018 - Sala Alfa - A07
ProIndústria2018 - Sala Alfa - A07
 
Small Language Models Explained A Beginners Guide.pdf
Small Language Models Explained A Beginners Guide.pdfSmall Language Models Explained A Beginners Guide.pdf
Small Language Models Explained A Beginners Guide.pdf
 
Technovision
TechnovisionTechnovision
Technovision
 
Northbay_December_2023_LLM_Reporting.pdf
Northbay_December_2023_LLM_Reporting.pdfNorthbay_December_2023_LLM_Reporting.pdf
Northbay_December_2023_LLM_Reporting.pdf
 
Unlocking the Power of Generative AI An Executive's Guide.pdf
Unlocking the Power of Generative AI An Executive's Guide.pdfUnlocking the Power of Generative AI An Executive's Guide.pdf
Unlocking the Power of Generative AI An Executive's Guide.pdf
 
Emma Inc - Case Study
Emma Inc - Case StudyEmma Inc - Case Study
Emma Inc - Case Study
 
MLSEV Virtual. ML Platformization and AutoML in the Enterprise
MLSEV Virtual. ML Platformization and AutoML in the EnterpriseMLSEV Virtual. ML Platformization and AutoML in the Enterprise
MLSEV Virtual. ML Platformization and AutoML in the Enterprise
 
Learning Management System Reporting & Analytics Launch
Learning Management System Reporting & Analytics LaunchLearning Management System Reporting & Analytics Launch
Learning Management System Reporting & Analytics Launch
 
Machine Learning Platformization & AutoML: Adopting ML at Scale in the Enterp...
Machine Learning Platformization & AutoML: Adopting ML at Scale in the Enterp...Machine Learning Platformization & AutoML: Adopting ML at Scale in the Enterp...
Machine Learning Platformization & AutoML: Adopting ML at Scale in the Enterp...
 
G11.2013 Application Development Life Cycle Management
G11.2013   Application Development Life Cycle ManagementG11.2013   Application Development Life Cycle Management
G11.2013 Application Development Life Cycle Management
 
Title_ From Concept to Launch_ ML-driven Software Product Development by Our ...
Title_ From Concept to Launch_ ML-driven Software Product Development by Our ...Title_ From Concept to Launch_ ML-driven Software Product Development by Our ...
Title_ From Concept to Launch_ ML-driven Software Product Development by Our ...
 
PMML - Predictive Model Markup Language
PMML - Predictive Model Markup LanguagePMML - Predictive Model Markup Language
PMML - Predictive Model Markup Language
 
CIO Applications Magazine Names Bardess One of the Top 25 ML Solution Providers
CIO Applications Magazine Names Bardess One of the Top 25 ML Solution ProvidersCIO Applications Magazine Names Bardess One of the Top 25 ML Solution Providers
CIO Applications Magazine Names Bardess One of the Top 25 ML Solution Providers
 
The Cloud Is The Corporation Abeyta
The Cloud Is The Corporation AbeytaThe Cloud Is The Corporation Abeyta
The Cloud Is The Corporation Abeyta
 
Decision Transformers Model.pdf
Decision Transformers Model.pdfDecision Transformers Model.pdf
Decision Transformers Model.pdf
 
Decision Transformers Model.pdf
Decision Transformers Model.pdfDecision Transformers Model.pdf
Decision Transformers Model.pdf
 
Emerging engineering issues for building large scale AI systems By Srinivas P...
Emerging engineering issues for building large scale AI systems By Srinivas P...Emerging engineering issues for building large scale AI systems By Srinivas P...
Emerging engineering issues for building large scale AI systems By Srinivas P...
 

More from ChristopherTHyatt

AI STRATEGY CONSULTING: STEERING BUSINESSES TOWARD AI-ENABLED TRANSFORMATION
AI STRATEGY CONSULTING: STEERING BUSINESSES TOWARD AI-ENABLED TRANSFORMATIONAI STRATEGY CONSULTING: STEERING BUSINESSES TOWARD AI-ENABLED TRANSFORMATION
AI STRATEGY CONSULTING: STEERING BUSINESSES TOWARD AI-ENABLED TRANSFORMATIONChristopherTHyatt
 
Evaluating the top large language models.pdf
Evaluating the top large language models.pdfEvaluating the top large language models.pdf
Evaluating the top large language models.pdfChristopherTHyatt
 
Building Your Own AI Agent System: A Comprehensive Guide
Building Your Own AI Agent System: A Comprehensive GuideBuilding Your Own AI Agent System: A Comprehensive Guide
Building Your Own AI Agent System: A Comprehensive GuideChristopherTHyatt
 
How to build an AI-based anomaly detection system for fraud prevention.pdf
How to build an AI-based anomaly detection system for fraud prevention.pdfHow to build an AI-based anomaly detection system for fraud prevention.pdf
How to build an AI-based anomaly detection system for fraud prevention.pdfChristopherTHyatt
 
The role of AI in invoice processing.pdf
The role of AI in invoice processing.pdfThe role of AI in invoice processing.pdf
The role of AI in invoice processing.pdfChristopherTHyatt
 
How to implement AI in traditional investment.pdf
How to implement AI in traditional investment.pdfHow to implement AI in traditional investment.pdf
How to implement AI in traditional investment.pdfChristopherTHyatt
 
Top Blockchain Technology Companies 2024
Top Blockchain Technology Companies 2024Top Blockchain Technology Companies 2024
Top Blockchain Technology Companies 2024ChristopherTHyatt
 
Transforming data into innovative solutions.pdf
Transforming data into innovative solutions.pdfTransforming data into innovative solutions.pdf
Transforming data into innovative solutions.pdfChristopherTHyatt
 
AI IN PROCUREMENT: REDEFINING EFFICIENCY THROUGH AUTOMATION
AI IN PROCUREMENT: REDEFINING EFFICIENCY THROUGH AUTOMATIONAI IN PROCUREMENT: REDEFINING EFFICIENCY THROUGH AUTOMATION
AI IN PROCUREMENT: REDEFINING EFFICIENCY THROUGH AUTOMATIONChristopherTHyatt
 
Financial fraud detection using machine learning models.pdf
Financial fraud detection using machine learning models.pdfFinancial fraud detection using machine learning models.pdf
Financial fraud detection using machine learning models.pdfChristopherTHyatt
 
AI IN PREDICTIVE ANALYTICS: TRANSFORMING DATA INTO FORESIGHT
AI IN PREDICTIVE ANALYTICS: TRANSFORMING DATA INTO FORESIGHTAI IN PREDICTIVE ANALYTICS: TRANSFORMING DATA INTO FORESIGHT
AI IN PREDICTIVE ANALYTICS: TRANSFORMING DATA INTO FORESIGHTChristopherTHyatt
 
AI IN DECISION MAKING: NAVIGATING THE NEW FRONTIER OF SMART BUSINESS DECISIONS
AI IN DECISION MAKING: NAVIGATING THE NEW FRONTIER OF SMART BUSINESS DECISIONSAI IN DECISION MAKING: NAVIGATING THE NEW FRONTIER OF SMART BUSINESS DECISIONS
AI IN DECISION MAKING: NAVIGATING THE NEW FRONTIER OF SMART BUSINESS DECISIONSChristopherTHyatt
 
AI applications in financial compliance An overview.pdf
AI applications in financial compliance An overview.pdfAI applications in financial compliance An overview.pdf
AI applications in financial compliance An overview.pdfChristopherTHyatt
 
AI FOR LEGAL RESEARCH: STREAMLINING LEGAL PRACTICES FOR THE DIGITAL AGE
AI FOR LEGAL RESEARCH: STREAMLINING LEGAL PRACTICES FOR THE DIGITAL AGEAI FOR LEGAL RESEARCH: STREAMLINING LEGAL PRACTICES FOR THE DIGITAL AGE
AI FOR LEGAL RESEARCH: STREAMLINING LEGAL PRACTICES FOR THE DIGITAL AGEChristopherTHyatt
 
AI in medicine A comprehensive overview.pdf
AI in medicine A comprehensive overview.pdfAI in medicine A comprehensive overview.pdf
AI in medicine A comprehensive overview.pdfChristopherTHyatt
 
Building an AI App: A Comprehensive Guide for Beginners
Building an AI App: A Comprehensive Guide for BeginnersBuilding an AI App: A Comprehensive Guide for Beginners
Building an AI App: A Comprehensive Guide for BeginnersChristopherTHyatt
 
OPTIMIZE TO ACTUALIZE: THE IMPACT OF HYPERPARAMETER TUNING ON AI
OPTIMIZE TO ACTUALIZE: THE IMPACT OF HYPERPARAMETER TUNING ON AIOPTIMIZE TO ACTUALIZE: THE IMPACT OF HYPERPARAMETER TUNING ON AI
OPTIMIZE TO ACTUALIZE: THE IMPACT OF HYPERPARAMETER TUNING ON AIChristopherTHyatt
 
A guide to LTV prediction using machine learning
A guide to LTV prediction using machine learningA guide to LTV prediction using machine learning
A guide to LTV prediction using machine learningChristopherTHyatt
 
AI for cloud computing A strategic guide.pdf
AI for cloud computing A strategic guide.pdfAI for cloud computing A strategic guide.pdf
AI for cloud computing A strategic guide.pdfChristopherTHyatt
 
GENERATIVE AI AUTOMATION: THE KEY TO PRODUCTIVITY, EFFICIENCY AND OPERATIONAL...
GENERATIVE AI AUTOMATION: THE KEY TO PRODUCTIVITY, EFFICIENCY AND OPERATIONAL...GENERATIVE AI AUTOMATION: THE KEY TO PRODUCTIVITY, EFFICIENCY AND OPERATIONAL...
GENERATIVE AI AUTOMATION: THE KEY TO PRODUCTIVITY, EFFICIENCY AND OPERATIONAL...ChristopherTHyatt
 

More from ChristopherTHyatt (20)

AI STRATEGY CONSULTING: STEERING BUSINESSES TOWARD AI-ENABLED TRANSFORMATION
AI STRATEGY CONSULTING: STEERING BUSINESSES TOWARD AI-ENABLED TRANSFORMATIONAI STRATEGY CONSULTING: STEERING BUSINESSES TOWARD AI-ENABLED TRANSFORMATION
AI STRATEGY CONSULTING: STEERING BUSINESSES TOWARD AI-ENABLED TRANSFORMATION
 
Evaluating the top large language models.pdf
Evaluating the top large language models.pdfEvaluating the top large language models.pdf
Evaluating the top large language models.pdf
 
Building Your Own AI Agent System: A Comprehensive Guide
Building Your Own AI Agent System: A Comprehensive GuideBuilding Your Own AI Agent System: A Comprehensive Guide
Building Your Own AI Agent System: A Comprehensive Guide
 
How to build an AI-based anomaly detection system for fraud prevention.pdf
How to build an AI-based anomaly detection system for fraud prevention.pdfHow to build an AI-based anomaly detection system for fraud prevention.pdf
How to build an AI-based anomaly detection system for fraud prevention.pdf
 
The role of AI in invoice processing.pdf
The role of AI in invoice processing.pdfThe role of AI in invoice processing.pdf
The role of AI in invoice processing.pdf
 
How to implement AI in traditional investment.pdf
How to implement AI in traditional investment.pdfHow to implement AI in traditional investment.pdf
How to implement AI in traditional investment.pdf
 
Top Blockchain Technology Companies 2024
Top Blockchain Technology Companies 2024Top Blockchain Technology Companies 2024
Top Blockchain Technology Companies 2024
 
Transforming data into innovative solutions.pdf
Transforming data into innovative solutions.pdfTransforming data into innovative solutions.pdf
Transforming data into innovative solutions.pdf
 
AI IN PROCUREMENT: REDEFINING EFFICIENCY THROUGH AUTOMATION
AI IN PROCUREMENT: REDEFINING EFFICIENCY THROUGH AUTOMATIONAI IN PROCUREMENT: REDEFINING EFFICIENCY THROUGH AUTOMATION
AI IN PROCUREMENT: REDEFINING EFFICIENCY THROUGH AUTOMATION
 
Financial fraud detection using machine learning models.pdf
Financial fraud detection using machine learning models.pdfFinancial fraud detection using machine learning models.pdf
Financial fraud detection using machine learning models.pdf
 
AI IN PREDICTIVE ANALYTICS: TRANSFORMING DATA INTO FORESIGHT
AI IN PREDICTIVE ANALYTICS: TRANSFORMING DATA INTO FORESIGHTAI IN PREDICTIVE ANALYTICS: TRANSFORMING DATA INTO FORESIGHT
AI IN PREDICTIVE ANALYTICS: TRANSFORMING DATA INTO FORESIGHT
 
AI IN DECISION MAKING: NAVIGATING THE NEW FRONTIER OF SMART BUSINESS DECISIONS
AI IN DECISION MAKING: NAVIGATING THE NEW FRONTIER OF SMART BUSINESS DECISIONSAI IN DECISION MAKING: NAVIGATING THE NEW FRONTIER OF SMART BUSINESS DECISIONS
AI IN DECISION MAKING: NAVIGATING THE NEW FRONTIER OF SMART BUSINESS DECISIONS
 
AI applications in financial compliance An overview.pdf
AI applications in financial compliance An overview.pdfAI applications in financial compliance An overview.pdf
AI applications in financial compliance An overview.pdf
 
AI FOR LEGAL RESEARCH: STREAMLINING LEGAL PRACTICES FOR THE DIGITAL AGE
AI FOR LEGAL RESEARCH: STREAMLINING LEGAL PRACTICES FOR THE DIGITAL AGEAI FOR LEGAL RESEARCH: STREAMLINING LEGAL PRACTICES FOR THE DIGITAL AGE
AI FOR LEGAL RESEARCH: STREAMLINING LEGAL PRACTICES FOR THE DIGITAL AGE
 
AI in medicine A comprehensive overview.pdf
AI in medicine A comprehensive overview.pdfAI in medicine A comprehensive overview.pdf
AI in medicine A comprehensive overview.pdf
 
Building an AI App: A Comprehensive Guide for Beginners
Building an AI App: A Comprehensive Guide for BeginnersBuilding an AI App: A Comprehensive Guide for Beginners
Building an AI App: A Comprehensive Guide for Beginners
 
OPTIMIZE TO ACTUALIZE: THE IMPACT OF HYPERPARAMETER TUNING ON AI
OPTIMIZE TO ACTUALIZE: THE IMPACT OF HYPERPARAMETER TUNING ON AIOPTIMIZE TO ACTUALIZE: THE IMPACT OF HYPERPARAMETER TUNING ON AI
OPTIMIZE TO ACTUALIZE: THE IMPACT OF HYPERPARAMETER TUNING ON AI
 
A guide to LTV prediction using machine learning
A guide to LTV prediction using machine learningA guide to LTV prediction using machine learning
A guide to LTV prediction using machine learning
 
AI for cloud computing A strategic guide.pdf
AI for cloud computing A strategic guide.pdfAI for cloud computing A strategic guide.pdf
AI for cloud computing A strategic guide.pdf
 
GENERATIVE AI AUTOMATION: THE KEY TO PRODUCTIVITY, EFFICIENCY AND OPERATIONAL...
GENERATIVE AI AUTOMATION: THE KEY TO PRODUCTIVITY, EFFICIENCY AND OPERATIONAL...GENERATIVE AI AUTOMATION: THE KEY TO PRODUCTIVITY, EFFICIENCY AND OPERATIONAL...
GENERATIVE AI AUTOMATION: THE KEY TO PRODUCTIVITY, EFFICIENCY AND OPERATIONAL...
 

Recently uploaded

Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...apidays
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherRemote DBA Services
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdfhans926745
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century educationjfdjdjcjdnsjd
 
Tech Trends Report 2024 Future Today Institute.pdf
Tech Trends Report 2024 Future Today Institute.pdfTech Trends Report 2024 Future Today Institute.pdf
Tech Trends Report 2024 Future Today Institute.pdfhans926745
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUK Journal
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024The Digital Insurer
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?Igalia
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CVKhem
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024Rafal Los
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Miguel Araújo
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationMichael W. Hawkins
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationRadu Cotescu
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Drew Madelung
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobeapidays
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...Martijn de Jong
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Scriptwesley chun
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsMaria Levchenko
 

Recently uploaded (20)

Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a Fresher
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century education
 
Tech Trends Report 2024 Future Today Institute.pdf
Tech Trends Report 2024 Future Today Institute.pdfTech Trends Report 2024 Future Today Institute.pdf
Tech Trends Report 2024 Future Today Institute.pdf
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CV
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed texts
 

FINE-TUNING LLAMA 2: DOMAIN ADAPTATION OF A PRE-TRAINED MODEL

  • 1. 1/27 Fine-tuning Llama 2: An overview leewayhertz.com/fine-tuning-llama2/ In the dynamic and ever-evolving field of generative AI, a profound sense of competition has taken root, fueled by a relentless quest for innovation and excellence. The introduction of GPT by OpenAI has prompted various businesses to work on creating their own Large Language Models (LLMs). However, creating such sophisticated algorithms is like navigating through a maze of complexities. It demands exhaustive research, a massive amount of relevant data and overcoming numerous other challenges. Further, the substantial computational power required for these tasks remains a significant hurdle for many. Amidst this fiercely competitive landscape, where industry heavyweights like OpenAI and Google have already etched their indelible marks, a new contender, Meta, entered the arena with their open-source LLM, Llama, with a goal of democratizing AI. They subsequently upgraded it to Llama 2, which was trained on 40% more data than its predecessor. While all large language models exhibit remarkable efficiency, their adaptability to handle domain-specific inquiries, such as those related to a business’s financial performance or inventory status, may be constrained. To empower these models with domain- specific competence and elevate their precision, a refinement process called fine-tuning is implemented. In this article, we will talk about fine-tuning Llama 2, a model that has opened up new avenues for innovation, research, and commercial applications. This process of fine-tuning may be considered imperative as it can yield numerous benefits like cost savings, secure management of confidential data, and the potential to surpass renowned models like GPT-4 in specialized tasks.
  • 2. 2/27 So, let’s dive deeper into the article and explore the transformative power of Llama 2 in redefining the boundaries of artificial intelligence, creating endless possibilities for businesses. What is Llama 2? Why use Llama 2? Why does Llama 2 matter in the AI landscape? How does Llama 2 work? A thorough analysis of Llama 2 in comparison to other leading LLMs What does fine-tuning an LLM mean? Techniques for LLM fine-tuning How can we perform fine-tuning on Llama 2? PEFT approaches – LoRA and QLoRa Fine-tuning the Llama 2 model with QLoRA Challenges in fine-tuning Llama 2 How does LeewayHertz help in building Llama 2 model-powered solutions? What is Llama 2? Meta’s recent unveiling of the Llama 2 suite signifies an important milestone in the evolution of LLMs. Launched in mid-July, Llama 2 emerges as a versatile series of both pre-trained and fine-tuned models, characterized by its diverse parameter configurations of 7B, 13B, and 70B. This release included comprehensive papers detailing the intricacies of its design, training, and implementation, offering invaluable insights into the advancements made in the AI sector. At the core of Llama 2’s development was an expansive training regimen built upon a staggering 2 trillion tokens—marking a 40% increase from previous endeavors. Sophisticated architectural interventions such as the grouped-query attention (GQA) mechanism further amplified this rigorous training. Particularly in the 70B model, GQA expedites inference, ensuring optimal performance without compromising speed. Furthermore, the model boasts a default context window of 4096 tokens, a significant advancement from previous iterations and a testament to its enhanced capability to handle complex contextual information. Architecturally, Llama 2 distinguishes itself from its peers through several innovative attributes. It leverages the RMSNorm normalization, SwiGLU activation, and rotatory positional embedding to further enhance its data processing prowess. Applying the Adam optimizer with a cosine learning rate schedule, a weight decay of 0.1, and gradient clipping underscores Meta’s commitment to refining even the most nuanced aspects of model development. Yet, the true innovation of Llama 2 lies not merely in its architectural and training advancements but in its fine-tuning strategies. Meta has judiciously prioritized quality over quantity in its Supervised Fine-Tuning (SFT) phase, a decision inspired by numerous studies indicating the superior model performance achieved through high-quality data. Complementing this is the Reinforcement Learning with Human Feedback (RLHF) stage, meticulously designed to calibrate the model in line with user preferences. Using a comparative approach where annotators evaluate model outputs, the RLHF process refines Llama 2 to accentuate helpfulness and safety in its responses.
  • 3. 3/27 Furthermore, Llama 2’s commercial adaptability is evident in its open-source and commercial character, facilitating ease of use and expansion. It’s not merely a static tool; it’s a dynamic solution optimized for dialogue use cases, as seen in the Llama-2-chat versions available on the Hugging Face platform. While the models differ in parameter size, their consistent optimization for both speed and accuracy underscores their adaptability to diverse operational demands. Overall, Llama 2, as a member of the Llama family of LLMs, not only aligns with the technical prowess of contemporaries like GPT-3 and PaLM 2 but also introduces several groundbreaking innovations. Its optimized transformer architecture, rigorous training, fine-tuning procedures, and open-source accessibility position it as a formidable asset in the AI landscape, promising a future of more accurate, efficient, and user-aligned AI solutions. Why use Llama 2? In today’s AI-driven landscape, responsibility and accountability take center stage. Meta’s Llama 2 is evidence of this heightened focus on creating AI solutions that are transparent, accountable, and open to scrutiny. This section delves into why Llama 2’s approach is pivotal in reshaping our understanding and expectations of AI models. Open source: The bedrock of transparency Most LLMs, such as OpenAI’s GPT-3, GPT 4, Google’s PaLM and PaLM 2, and Anthropic’s Claude, have predominantly been closed source. This limited accessibility restricts the broader research community from fully understanding these models’ intricacies and decision-making processes. Llama 2 stands in stark contrast. Being open source enables anyone with relevant technical expertise not just to access but also to dissect, understand, and potentially modify the model. By enabling people to peruse the research paper detailing Llama 2’s development and training and even download the model for personal or business use, Meta is championing an era of transparency in AI. Ensuring safety through red-teaming Safety in AI is paramount, and Llama 2’s development process reflects this priority. Through internal and third-party commissions, adversarial prompts were generated through intensive red-teaming exercises to facilitate model fine-tuning. These rigorous processes are not just a one-time effort; they signify Meta’s ongoing commitment to refining model safety iteratively. The intention is clear: ensuring Llama 2 is robust against unforeseen challenges. Transparent reporting: An insight into model evaluation The research paper details Meta’s schematic transparency, highlighting the challenges encountered during the development of Llama 2. By highlighting known issues and outlining the steps taken to mitigate them – and those planned for future iterations – Meta is providing an open playbook on the model’s strengths and areas for improvement. Empowering developers: “Responsible use guide” and “Acceptable use policy”
  • 4. 4/27 With great power comes great responsibility. Acknowledging LLMs’ vast potential and inherent risks, Meta has devised a “Responsible Use Guide” to steer developers towards best practices in AI development and safety evaluations. Complementing this is an “Acceptable Use Policy,” which defines boundaries for ensuring the responsible use of the model. Engaging the global community Meta recognizes the collective intelligence of the global community. Introducing initiatives such as the Open Innovation AI Research Community invites academic researchers to share insights and research on the responsible development of LLMs. Furthermore, the Llama Impact Challenge is a call to action for public, non-profit, and for-profit entities to harness Llama 2 in addressing critical global challenges like environmental conservation and education. Launch your project with LeewayHertz We specialize in fine-tuning pre-trained LLMs to ensure they offer domain-specific responses tailored to your unique business requirements. For the specifics you’re looking for, contact us today! Learn More Why does Llama 2 matter in the AI landscape? The global AI community has long awaited a shift from commercial monopolization towards open-source research and experimentation. Meta’s Llama 2 heralds this change. By offering an open-source AI, Meta ensures a credible alternative to closed-source AI. It democratizes AI, allowing other companies to develop AI-powered applications under their control, bypassing the commercial constraints of tech giants like Apple, Google, and Amazon. Llama 2 is not just a technological marvel; it’s a statement on the importance of responsibility, transparency, and collaboration in AI. It embodies a future where AI development prioritizes societal benefits, open dialogue, and ethical considerations. How does Llama 2 work? Llama 2, a state-of-the-art language model, has been built using sophisticated training techniques to understand and generate human-like text. To comprehend its operations, one must delve into its data sources, training methodologies, and potential applications. Data sources and neural network training Llama 2’s foundational strength is attributed to its extensive training on a staggering 2 trillion tokens. These tokens were sourced from publicly accessible repositories, including: Common crawl: An expansive archive encompassing billions of web pages. Wikipedia: The free encyclopedia offering a wealth of knowledge on myriad topics. Project gutenberg: A treasure trove of public domain books.
  • 5. 5/27 Each token, be it a word or a semantic fragment, empowers Llama 2 to discern the meaning behind the text. For instance, if the model consistently encounters “Apple” and “iPhone” together, it infers the inherent relationship between these terms, distinguishing it from other related terms such as “apple” and “fruit.” Ensuring quality and mitigating bias Given the vastness and diversity of the internet, training a model solely on such data can inadvertently introduce biases or produce inappropriate content. Acknowledging this, the developers of Llama 2 incorporated additional training mechanisms: Reinforcement Learning with Human Feedback (RLHF): This technique involves human testers who evaluate multiple AI-generated responses. Their feedback is instrumental in guiding the model towards generating more relevant and appropriate content. Adaptation for conversational context Llama 2’s chat versions were meticulously fine-tuned using specific data sets to enhance conversational prowess. This ensures that when engaged in a dialogue, Llama 2 responds naturally, simulating human interaction. Customization and fine-tuning One of Llama 2’s defining features is its adaptability. Organizations can mold it to resonate with their unique brand voice. For instance, if a firm wishes to produce summaries reflecting its distinct style, Llama 2 can be trained on numerous examples to achieve this. Similarly, the model can be fine-tuned for customer support optimization using FAQs and chat logs, allowing it to respond precisely to user queries. Llama 2’s robustness and adaptability are products of its comprehensive training and fine-tuning methodologies. Its ability to assimilate vast data, combined with human feedback mechanisms and customization options, positions it at the forefront of the language model domain. A thorough analysis of Llama 2 in comparison to other leading LLMs The advancement of AI, especially in the domain of large language models, has been nothing short of extraordinary. This is prominently demonstrated by Llama 2, an LLM designed with adaptability in mind to empower developers and researchers to explore new horizons and create innovative applications. Here, we explore the outcomes of some experiments carried out to evaluate how Llama 2 compares to giants like OpenAI’s GPT and Google’s PaLM. Creative aptitude: Llama 2 was prompted to simulate a sarcasm-laden dialogue on space exploration; the resultant discourse, although impressive, was trailing slightly behind ChatGPT. When compared with Google’s Bard, Llama 2 showcased a superior flair. Thus, while ChatGPT remains the frontrunner in creative engagements, Llama 2 holds a commendable position amongst its peers.
  • 6. 6/27 Programming capabilities: Llama 2 was pitted against ChatGPT and Bard in a coding challenge. The task? To develop functional applications ranging from a basic to-do list to a Tetris game. Although ChatGPT mastered each challenge, Llama 2, akin to Bard, efficiently crafted the to-do list and an authentication system, stumbling only on the Tetris game. Mathematical proficiency: Llama 2’s prowess in solving algebraic and logical math problems was noteworthy, particularly when compared to Bard. However, ChatGPT’s mathematical proficiency remained unmatched. Remarkably, Llama 2 excelled in certain problems where its predecessors, in their early stages, had faltered. Reasoning and commonsense: A facet that remains a challenge for many AI models is commonsense reasoning. ChatGPT unsurprisingly led the pack. The contest for the second spot was neck-to-neck between Bard and Llama 2, with Bard slightly edging out. Llama 2, though an impressive foundational model, still has room for growth compared to certain other specialized, fine-tuned models on the market. Foundational models like Llama 2 are designed with versatility and future adaptability at their core, unlike fine-tuned models optimized for domain-specific expertise. Given its nascent stage and its ‘foundational’ nature, the potential avenues for Llama 2’s evolution are promising. What does fine-tuning an LLM mean? When discussing the fine-tuning of LLMs, it’s crucial to recognize that such practices extend beyond language models. Fine-tuning can be applied across various machine learning models based on different use cases.
  • 7. 7/27 Machine learning models are trained to identify patterns within given datasets. For instance, a Convolutional Neural Network (CNN) designed to detect cars in urban areas would be highly proficient in that domain due to training on relevant images. Yet, when faced with detecting trucks on highways, its efficacy might decrease due to unfamiliarity with that data distribution. Rather than starting from scratch with a new training dataset, fine-tuning allows for adjustments to be made to the model to accommodate new data types. Several advanced LLMs are available, including GPT-3, Bloom, BERT, T5, and XLNet. GPT-3, for instance, is a premium model recognized for its vast training on 175 billion parameters, making it adept for various natural language processing tasks. BERT, conversely, is a more accessible open-source model excelling in understanding contextual word relationships. The choice between models like GPT-3 and BERT largely depends on the specific task at hand, be it text generation or text classification. Techniques for LLM fine-tuning The process of fine-tuning LLMs is intricate, with varying techniques ideal for specific applications. Sometimes, the goal is to train a model to suit a novel task. Imagine having a pre-trained LLM skilled in text generation, but you want it to perform sentiment analysis. This will entail remodeling the model with subtle architectural tweaks before diving into the fine-tuning phase. In such a context, you will primarily harness the numeric vectors called embeddings generated by the LLM’s transformer component. These embeddings carry detailed features of the given input. Certain LLMs directly produce these embeddings, whereas others, such as the GPT series, use these embeddings for token or text generation. During adaptation, the LLM’s embedding layer gets linked to a classification system, typically a set of fully connected layers translating embeddings into class probabilities. The emphasis here lies in training the classification segment using model-driven embeddings. While the LLM’s attention layers generally remain unchanged—offering computational efficiency—the classifier requires a supervised learning dataset with text instances and their respective classifications. The magnitude of your fine-tuning data relies on task intricacy and classifier specifics. Yet, occasions demand a deeper adjustment, requiring unlocking attention layers for a full-blown fine-tuning project. It’s worth noting that this intensive process is also dependent on the model size. Besides, there exist strategies to streamline costs related to fine-tuning. Let’s delve deeper and explore some prominent fine- tuning techniques.
  • 8. 8/27 Unsupervised versus supervised fine-tuning (SFT) Sometimes, there’s a need to refresh the LLM’s knowledge base without necessarily changing its behavior. If, for instance, you intend to adapt the model to medical terminologies or a novel language, harnessing an expansive, unstructured dataset suffices. You can choose between unsupervised pretraining with ample unstructured data or supervised fine-tuning with labeled datasets for a specific task.Here, the goal is to immerse the model in a sea of tokens representative of the new domain or anticipated input types. Leveraging vast unstructured datasets is scalable, thanks to unsupervised or self-supervised methodologies.However, there are cases where merely updating the model’s information reservoir falls short. An LLM’s behavior needs an overhaul, necessitating a supervised fine-tuning (SFT) dataset, complete with prompts and expected outcomes. This method is pivotal for models like ChatGPT, which are designed to be highly responsive to user directives.
  • 9. 9/27 Reinforcement Learning from Human Feedback (RLHF) In elevating SFT, some practitioners employ reinforcement learning from human feedback, which is a complex procedure. Currently, only well-resourced organizations have the capacity to employ RLHF. While RLHF techniques vary, they all emphasize human-guided LLM training. Human reviewers assess the model’s outputs for certain prompts, guiding the model toward desired results.Take ChatGPT by OpenAI as a RLHF benchmark. Human feedback aids in developing a reward model mirroring human preferences. The LLM then undergoes rigorous reinforcement learning to optimize its outcomes based on these reward pointers.
  • 10. 10/27 Parameter-efficient Fine-tuning (PEFT) PEFT, an emerging field within LLM fine-tuning, tries to minimize the resources spent on updating model parameters. PEFT techniques focus on limiting parameter alterations.One such method gaining traction is the Low-rank Adaptation (LoRA). The essence of LoRA is that only certain parameters need adjustments for downstream tasks. Thus, a compact matrix can capture task- specific nuances.Implementing LoRA implies training this compact matrix rather than the entire LLM’s parameters. Once trained, the LoRA model weights can either merge with the primary LLM or be used during inference.Adopting techniques like LoRA can reduce fine-tuning expenditures considerably while enabling the storage of numerous fine-tuned models ready for integration during LLM operations. Reinforcement Learning from AI Feedback (RLAIF) Fine-tuning a Large Language Model (LLM) using Reinforcement Learning from AI Feedback (RLAIF) involves a structured process that ensures the model’s behavior aligns with a set of predefined principles or guidelines, often encapsulated in a Constitution. Here’s an overview of the steps involved in fine-tuning an LLM using RLAIF: Define the Constitution Constitution creation: Begin by defining the Constitution, a document or set of guidelines that outlines the principles, ethics, and behavioral norms that the AI model should adhere to. This Constitution will guide the AI Feedback Model in generating preferences.
  • 11. 11/27 Set up the AI feedback model Model selection: Choose or develop an AI feedback model capable of understanding and applying the principles outlined in the Constitution. Model training (if necessary): If the AI feedback model isn’t pre-trained, you might need to train it to interpret the Constitution and evaluate responses based on it. This could involve supervised learning, using a dataset where responses are annotated based on their alignment with constitutional principles. Generate feedback data Feedback generation: Use the AI feedback model to evaluate pairs of prompt/response instances. For each pair, the model assigns a preference score, indicating which response aligns better with the principles in the Constitution. Train the Preference Model (PM) Data preparation: Organize the AI-generated feedback into a dataset suitable for training the Preference Model (PM). Preference model training: Train the model on this dataset. It learns to predict the preferred response to a given prompt based on the feedback scores provided by the AI feedback model. Fine-tune the LLM Integration with reinforcement learning: Integrate the trained preference model into a reinforcement learning framework. In this setup, the preference model provides the reward signal based on how well a response from the LLM aligns with the constitutional principles. LLM fine-tuning: Fine-tune the LLM using this reinforcement learning setup. The LLM generates responses to prompts, and the responses are evaluated by the PM. The LLM then adjusts its parameters to maximize the reward signal, effectively learning to produce responses that better align with the constitutional principles. Evaluation and iteration Model evaluation: After fine-tuning, evaluate the LLM’s performance to ensure it aligns with the desired principles and effectively handles a variety of prompts. Feedback loop: If the performance is not satisfactory or if there’s room for improvement, you might need to iterate over the process. This could involve refining the Constitution, adjusting the AI feedback model, retraining the preference model, or further fine-tuning the LLM. Deployment and monitoring Deployment: Once the fine-tuning process meets the performance and ethical standards, deploy the model. Continuous monitoring: Regularly monitor the model’s performance and behavior to ensure it continues to align with the constitutional principles, adapting to new data and evolving requirements.
  • 12. 12/27 Fine-tuning an LLM using RLAIF is a complex process that involves careful design, consistent evaluation, and ongoing adjustment to ensure that the model’s behavior aligns with human values and ethical standards. It’s a dynamic process that benefits from continuous monitoring and iterative improvement. Launch your project with LeewayHertz We specialize in fine-tuning pre-trained LLMs to ensure they offer domain-specific responses tailored to your unique business requirements. For the specifics you’re looking for, contact us today! Learn More How can we perform fine-tuning on Llama 2? PEFT approaches – LoRA and QLoRA Parameter-efficient Fine-tuning (PEFT) presents an effective approach to fine-tuning LLMs. Distinct from traditional methods that mandate extensive parameter updates, PEFT focuses on refining a select subset of parameters, minimizing computational demands and expediting the training process. By gauging the significance of individual parameters based on their influence on the overall model, PEFT prioritizes those with maximal impact. Consequently, only these pivotal parameters undergo adjustments during the fine- tuning phase, while others remain static. Such a strategy curtails computational and temporal overheads and paves the way for swift model iteration and deployment. As PEFT emerges as a frontrunner in optimization techniques, it’s vital to recognize that it remains a dynamic field, with continuous research ushering in nuanced variations and enhancements. The choice of PEFT application will invariably depend on specific research goals and practical contexts. PEFT is an innovative approach that effectively reduces RAM and storage demands. It achieves this by primarily refining a select set of parameters while maintaining the majority in their original state. PEFT’s strength lies in its ability to foster robust generalization even when datasets are of limited volume. Moreover, it augments the model’s reusability and transferability. Small model checkpoints, derived from PEFT, seamlessly integrate with the foundational model, promoting versatile fine-tuning across diverse scenarios by incorporating PEFT-specific parameters. A salient feature is the preservation of insights from the pre-training phase, ensuring the model remains resilient to extensive memory loss or catastrophic forgetting. Prominent PEFT strategies emphasize the integrity of the pre-trained base, introducing supplementary layers or parameters termed “Adapters.” Through a process dubbed “adapter-tuning,” these layers are integrated with the foundational model, with tuning efforts concentrated on the novel layers alone. A notable challenge with this model is the heightened latency during the inference stage, potentially hampering efficiency in various contexts. Parameter-efficient fine-tuning has become a pivotal area of focus within AI, and there are myriad techniques to achieve this. Among these, the Low-rank Adaptation (LoRA) and its enhanced counterpart, QLoRA, are distinguished for their effectiveness. Low-rank Adaptation (LoRA)
  • 13. 13/27 LoRA introduces an innovative paradigm in model fine-tuning, offering a modular method adept at domain-specific tasks and transferring learning capabilities. The intrinsic beauty of LoRA lies in its ability to be executed using minimal resources while being memory-conservative. A closer examination of the LoRA technique reveals the following steps and intricacies: Pre-trained parameter preservation: The original neural network’s foundational parameters (W) remain unaltered during the adaptation process. Inclusion of new parameters: Accompanying this original setup, supplementary networks (denoted as WA and WB) are embedded. These networks champion the use of low-rank vectors. The dimensionality of these vectors (dxr and rxd) is purposefully diminished compared to the original network’s dimensions. Here, ‘d’ symbolizes the original vector’s dimension, and ‘r’ denotes the low rank. Notably, a smaller ‘r’ accelerates training, although it may require a fine balance to maintain optimal performance. Dot product calculation: Both the original and low-rank networks are intertwined through a dot product, generating an ‘n’-dimensional weight matrix that informs the model’s results. Loss function computation: The loss function is discerned by contrasting the derived results against expected outputs. Traditional backpropagation methods are then harnessed to calibrate the WA and WB weights. The LoRA’s essence is its economical memory footprint and infrastructure demands. For instance, given a 512×512 parameter matrix in a typical feed-forward network (equivalent to 262,144 parameters), by leveraging a LoRA adapter with a rank of 2, only 2,048 parameters (512×2 for both WA and WB) undergo domain-specific data training. This streamlined process significantly elevates computational efficiency.
  • 14. 14/27 An exceptional facet of LoRA is its modular design. The trained adapter can be retained as an independent entity, serving as a modular component for specific domains. Furthermore, LoRA adeptly bypasses potential catastrophic memory loss by abstaining from modifying the foundational weights. Further developments: QLoRA To further accentuate the effectiveness of LoRA, QLoRA has been introduced as an augmented technique, promising enhanced optimization and performance. This advanced method builds upon the foundational principles of LoRA, optimizing it for even more intricate tasks. QLoRA builds upon LoRA to further optimize efficiency by converting the weight values of the original network from high-definition formats, like Float32, to more compact types, such as int4. This conversion reduces memory usage and accelerates computational speeds. QLoRA introduces three primary enhancements over LoRA, establishing it as a leading method in PEFT. 1. 4-bit NF4 quantization Using 4-bit NormalFloat4 is a strategic move to decrease the storage requirements. This process is divided into three phases: Normalization & quantization: Here, weights are shifted to a neutral mean and a consistent unit variance. Given that a 4-bit data format can hold just 16 distinct values, weights are aligned with the closest among these 16 based on their relative position. For example, if there’s an FP32 weight of value 0.2121, its nearest 4-bit equivalent would be stored, not the exact value. Dequantization: This is the reverse process. Post-training, the original weights, which had been adjusted, are restored to their near-original form. Double quantization: This phase enhances memory optimization further. Grouping quantization values and applying an 8-bit quantization can result in a significant reduction in memory usage. In essence, for a model with 1 million parameters, the memory demand can be slashed to around 125,000 bits. 2. Unified memory paging
  • 15. 15/27 Together with the quantization methods, QLoRA leverages nVidia’s unified memory capabilities. This feature facilitates smooth transfers between GPU and CPU memory. This is particularly useful during memory-intensive operations or unexpected GPU demand spikes, ensuring no memory overflow. While both LoRA and QLoRA are at the forefront of PEFT, QLoRA’s advanced techniques offer superior efficiency and optimization. Fine-tuning the Llama 2 model with QLoRA Let’s delve into the process of fine-tuning the Llama 2 model, which features a massive 7 billion parameters. We will harness the computational power of a T4 GPU, backed by high RAM, available on Google Colab at a rate of 2.21 credits per hour. It’s worth noting that the T4 comes equipped with 16 GB of VRAM. Now, when you consider the weight of Llama 2-7b (7 billion parameters equating to 14 GB in FP16 format), the VRAM is stretched almost to its limit. This scenario doesn’t even factor in additional overheads such as optimizer states, gradients, and forward activations. The implication is clear: traditional fine-tuning won’t work here. We need to apply parameter-efficient fine-tuning techniques, such as LoRA or QLoRA. One way to significantly cut down on VRAM usage is by fine-tuning the model using 4-bit precision. This makes QLoRA an apt choice. Fortunately, the Hugging Face ecosystem is equipped with libraries like transformers, accelerate, peft, trl, and bitsandbytes to facilitate this. Our step-by-step code is inspired by the contributions of Younes Belkada on GitHub. We initiate the process by installing and activating these libraries. !pip install -q accelerate==0.21.0 peft==0.4.0 bitsandbytes==0.40.2 transformers==4.31.0 trl==0.4.7 import os import torch from datasets import load_dataset from transformers import ( AutoModelForCausalLM, AutoTokenizer, BitsAndBytesConfig, HfArgumentParser, TrainingArguments, pipeline, logging,
  • 16. 16/27 ) from peft import LoraConfig, PeftModel from trl import SFTTrainer Let’s delve into the adjustable parameters in this context. We will begin by loading the llama-2-7b-chat-hf model, commonly referred to as the chat model. Our aim is to train this model using the dataset mlabonne/guanaco-llama2-1k, which comprises 1,000 samples. Upon completion, the resulting fine-tuned model will be termed llama-2-7b-miniguanaco. For those curious about the origin and creation of this dataset, a detailed notebook is available for review. However, do note that customization is possible. The Hugging Face Hub boasts a plethora of valuable datasets, including the notable databricks/databricks- dolly-15k. In employing QLoRA, we will set the rank at 64, coupled with a scaling parameter of 16. Our approach involves loading the Llama 2 model directly in 4-bit precision, specifically employing the NF4 type, and then training it over a single epoch. For insights into other associated parameters, you are encouraged to explore the TrainingArguments, PeftModel, and SFTTrainer documentation. # The model that you want to train from the Hugging Face hub model_name = "NousResearch/Llama-2-7b-chat-hf" # The instruction dataset to use dataset_name = "mlabonne/guanaco-llama2-1k" # Fine-tuned model name new_model = "llama-2-7b-miniguanaco" ################################################################################ # QLoRA parameters ################################################################################ # LoRA attention dimension lora_r = 64 # Alpha parameter for LoRA scaling lora_alpha = 16 # Dropout probability for LoRA layers lora_dropout = 0.1
  • 17. 17/27 ################################################################################ # bitsandbytes parameters ################################################################################ # Activate 4-bit precision base model loading use_4bit = True # Compute dtype for 4-bit base models bnb_4bit_compute_dtype = "float16" # Quantization type (fp4 or nf4) bnb_4bit_quant_type = "nf4" # Activate nested quantization for 4-bit base models (double quantization) use_nested_quant = False ################################################################################ # TrainingArguments parameters ################################################################################ # Output directory where the model predictions and checkpoints will be stored output_dir = "./results" # Number of training epochs num_train_epochs = 1 # Enable fp16/bf16 training (set bf16 to True with an A100) fp16 = False bf16 = False # Batch size per GPU for training per_device_train_batch_size = 4 # Batch size per GPU for evaluation per_device_eval_batch_size = 4 # Number of update steps to accumulate the gradients for
  • 18. 18/27 gradient_accumulation_steps = 1 # Enable gradient checkpointing gradient_checkpointing = True # Maximum gradient normal (gradient clipping) max_grad_norm = 0.3 # Initial learning rate (AdamW optimizer) learning_rate = 2e-4 # Weight decay to apply to all layers except bias/LayerNorm weights weight_decay = 0.001 # Optimizer to use optim = "paged_adamw_32bit" # Learning rate schedule (constant a bit better than cosine) lr_scheduler_type = "constant" # Number of training steps (overrides num_train_epochs) max_steps = -1 # Ratio of steps for a linear warmup (from 0 to learning rate) warmup_ratio = 0.03 # Group sequences into batches with same length # Saves memory and speeds up training considerably group_by_length = True # Save checkpoint every X updates steps save_steps = 25 # Log every X updates steps logging_steps = 25 ################################################################################ # SFT parameters
  • 19. 19/27 ################################################################################ # Maximum sequence length to use max_seq_length = None # Pack multiple short examples in the same input sequence to increase efficiency packing = False # Load the entire model on the GPU 0 device_map = {"": 0} Let’s commence the fine-tuning process, integrating various components for this task. Launch your project with LeewayHertz We specialize in fine-tuning pre-trained LLMs to ensure they offer domain-specific responses tailored to your unique business requirements. For the specifics you’re looking for, contact us today! Learn More Initially, we will source the previously defined dataset. It’s pertinent to note that our dataset is already refined; however, under typical circumstances, this step would entail reshaping prompts, filtering out inconsistent text, amalgamating multiple datasets, and so forth. Subsequently, we will set up bitsandbytes to facilitate 4-bit quantization. Following this, we will instantiate the Llama 2 model in 4-bit precision on a GPU, aligning it with the appropriate tokenizer. To conclude our preparations, we will initialize the configurations for QLoRA, outline the standard training parameters, and forward all these settings to the SFTTrainer. With everything in place, the training journey begins! # Load dataset (you can process it here) dataset = load_dataset(dataset_name, split="train") # Load tokenizer and model with QLoRA configuration compute_dtype = getattr(torch, bnb_4bit_compute_dtype) bnb_config = BitsAndBytesConfig( load_in_4bit=use_4bit, bnb_4bit_quant_type=bnb_4bit_quant_type,
  • 20. 20/27 bnb_4bit_compute_dtype=compute_dtype, bnb_4bit_use_double_quant=use_nested_quant, ) # Check GPU compatibility with bfloat16 if compute_dtype == torch.float16 and use_4bit: major, _ = torch.cuda.get_device_capability() if major >= 8: print("=" * 80) print("Your GPU supports bfloat16: accelerate training with bf16=True") print("=" * 80) # Load base model model = AutoModelForCausalLM.from_pretrained( model_name, quantization_config=bnb_config, device_map=device_map ) model.config.use_cache = False model.config.pretraining_tp = 1 # Load LLaMA tokenizer tokenizer = AutoTokenizer.from_pretrained(model_name, trust_remote_code=True) tokenizer.add_special_tokens({'pad_token': '[PAD]'}) tokenizer.pad_token = tokenizer.eos_token tokenizer.padding_side = "right" # Load LoRA configuration peft_config = LoraConfig( lora_alpha=lora_alpha,
  • 21. 21/27 lora_dropout=lora_dropout, r=lora_r, bias="none", task_type="CAUSAL_LM", ) # Set training parameters training_arguments = TrainingArguments( output_dir=output_dir, num_train_epochs=num_train_epochs, per_device_train_batch_size=per_device_train_batch_size, gradient_accumulation_steps=gradient_accumulation_steps, optim=optim, save_steps=save_steps, logging_steps=logging_steps, learning_rate=learning_rate, weight_decay=weight_decay, fp16=fp16, bf16=bf16, max_grad_norm=max_grad_norm, max_steps=max_steps, warmup_ratio=warmup_ratio, group_by_length=group_by_length, lr_scheduler_type=lr_scheduler_type, report_to="tensorboard" ) # Set supervised fine-tuning parameters
  • 22. 22/27 trainer = SFTTrainer( model=model, train_dataset=dataset, peft_config=peft_config, dataset_text_field="text", max_seq_length=max_seq_length, tokenizer=tokenizer, args=training_arguments, packing=packing, ) # Train model trainer.train() # Save trained model trainer.model.save_pretrained(new_model) The duration of the training process can vary significantly based on your dataset’s size. In this instance, it was completed in under an hour using a T4 GPU. To review the progress visually, one can refer to the plots available on Tensorboard.
  • 23. 23/27 %load_ext tensorboard %tensorboard --logdir results/runs Let’s confirm that the model is operating as expected. While a comprehensive evaluation would be ideal, we can utilize the text generation pipeline for preliminary assessments by posing questions such as, “What is a large language model?” It’s crucial to adjust the input to align with Llama 2’s prompt structure. # Ignore warnings logging.set_verbosity(logging.CRITICAL) # Run text generation pipeline with our next model prompt = "What is a large language model?" pipe = pipeline(task="text-generation", model=model, tokenizer=tokenizer, max_length=200) result = pipe(f"[INST] {prompt} [/INST]") print(result[0]['generated_text']) The model provides the subsequent answer: /usr/local/lib/python3.10/dist-packages/transformers/generation/utils.py:1270: UserWarning: You have modified the pretrained model configuration to control generation. This is a deprecated strategy to control generation and will be removed soon, in a future version. Please use a generation configuration file (see https://huggingface.co/docs/transformers/main_classes/text_generation ) warnings.warn( /usr/local/lib/python3.10/dist-packages/torch/utils/checkpoint.py:31: UserWarning: None of the inputs have requires_grad=True. Gradients will be None warnings.warn("None of the inputs have requires_grad=True. Gradients will be None") INST] What is a large language model? [/INST] A large language model is a type of artificial intelligence (AI) model that is trained on a large dataset of text to generate human-like language outputs. It is designed to be able to understand and generate text in a way that is similar to human language, and can be used for a variety of applications such as chatbots, language translation, and text summarization. Large language models are typically trained using deep learning techniques, such as recurrent neural networks (RNNs) or transformer models, and are often based on pre-trained models such as BERT or RoBERTa. These models are trained on large datasets of text, such as books, articles, or websites, and are designed to learn the patterns and structures of language. Some examples of large language models include: * BERT (Bidirectional Encoder Representations from Transformers):
  • 24. 24/27 Drawing from our observations, the coherence demonstrated by a model encompassing merely 7 billion parameters is quite impressive. Feel free to experiment further by posing more complex questions, perhaps drawing from datasets like BigBench-Hard. Historically, the Guanaco dataset has been pivotal in crafting top-tier models. To achieve this, consider training a Llama 2 model utilizing the mlabonne/guanaco-llama2 dataset. So, how do we save our refined llama-2-7b-miniguanaco model? The key lies in integrating the LoRA weights with the foundational model. Presently, a direct, seamless method to achieve this eludes us. The procedure involves reloading the base model in FP16 precision and harnessing the capabilities of the peft library for amalgamation. Regrettably, this approach has occasionally been met with VRAM-related challenges, even after its clearance. It might be beneficial to restart the notebook, initiate the primary three cells, and then progress to the subsequent one. # Reload model in FP16 and merge it with LoRA weights base_model = AutoModelForCausalLM.from_pretrained( model_name, low_cpu_mem_usage=True, return_dict=True, torch_dtype=torch.float16, device_map=device_map, ) model = PeftModel.from_pretrained(base_model, new_model) model = model.merge_and_unload() # Reload tokenizer to save it tokenizer = AutoTokenizer.from_pretrained(model_name, trust_remote_code=True) tokenizer.add_special_tokens({'pad_token': '[PAD]'}) tokenizer.pad_token = tokenizer.eos_token tokenizer.padding_side = "right" Having successfully combined our weights and reinstated the tokenizer, we are positioned to upload the entirety to the Hugging Face Hub, ensuring our model’s preservation. !huggingface-cli login model.push_to_hub(new_model, use_temp_dir=False)
  • 25. 25/27 tokenizer.push_to_hub(new_model, use_temp_dir=False) This model is now ready for inference and can be accessed and loaded from the Hub just as you would with any other Llama 2 model. Challenges in fine-tuning Llama 2 Navigating the fine-tuning process Fine-tuning LLMs like Llama 2 presents its unique set of complexities, differing from standard text-to-text model adaptations. The process remains intricate for enterprise applications even with supportive libraries like HuggingFace’s transformers and trl. Key challenges include: Absence of a standard interface to set prompt and task descriptors and to adjust datasets in alignment with these parameters. The multitude of training parameters that necessitate manual configuration tailored to specific datasets. The onus is establishing, managing, and scaling a robust infrastructure for fine-tuning distributed models. Achieving optimal performance with a model with around 7B parameters becomes challenging, especially when considering GPU memory constraints. Understanding and deploying distributed training effectively mandates a deep-rooted understanding of the subject. Securing computational assets LLMs, by nature, are voracious consumers of computational resources. Their memory, power, and time demands are lofty, constraining entities lacking these resources. This disparity can act as a barrier to universalizing the fine-tuning process. Streamlining distributed model training The sheer size of LLMs like Llama 2 makes it impractical to house them on a singular GPU, barring a few like the A100s. This necessitates a shift from standard parallel training to either model parallel or pipeline parallel training, whereby model weights are disseminated across multiple GPU instances. Open-source tools such as Deepspeed facilitate this, but mastering its vast array of configurable parameters can be daunting. Incorrect configurations can lead to memory overflow on CPUs/GPUs or suboptimal GPU usage due to unwarranted offloading, elevating training costs. How does LeewayHertz help in building Llama 2 model-powered solutions? LeewayHertz, a seasoned AI development company, offers expert solutions in fine-tuning the Llama 2 model to build custom solutions aligned with specific organizational needs and objectives. Here is how we can help you: Strategic consulting
  • 26. 26/27 Our consulting process begins by deeply understanding your organization’s goals, challenges, and competitive landscape. We then recommend the most appropriate Llama 2 model-powered solution tailored to your specific needs. Finally, we develop a comprehensive implementation strategy, ensuring the solution aligns perfectly with your objectives and positions your organization for success in the rapidly evolving tech landscape. Data engineering for Llama 2 With precise data engineering, we transform your organization’s valuable data into a powerful asset for the development of highly effective Llama 2 model-powered solutions. Our skilled developers carefully prepare your proprietary data, making sure it meets the necessary standards for fine-tuning the Llama 2 model, thus optimizing its performance to the fullest potential. Fine-tuning expertise in Llama 2 We fine-tune the Llama 2 model with your proprietary data for domain-specific performance and build a customized solution around it. This approach ensures the solution delivers accurate and meaningful responses within your unique context. Custom Llama 2 solutions We ensure innovation, efficiency, and a competitive edge with our expertly developed Llama 2 model- powered solutions. Whether you need chatbots for personalized customer interactions, intelligent content generators, or context-aware recommendation systems, our Llama 2 model-powered applications are meticulously crafted to enhance your organization’s capabilities in the dynamic AI landscape. Seamless integration of Llama 2 We ensure that the Llama 2 model-powered solutions we develop seamlessly align with your existing processes. Our approach involves analyzing your workflows, identifying key integration points, and developing a customized integration strategy. This minimizes disruptions while maximizing the benefits of our solutions, facilitating a smooth transition for your organization into a more efficient, AI-enhanced operational environment. Continuous evolution: Upgrades and maintenance We ensure to keep your Llama 2 model-powered application up-to-date and performance-optimized with our comprehensive upgrade and maintenance services. We diligently monitor emerging trends, security updates, and advancements in AI technology, ensuring your application stays competitive and secure in the rapidly evolving tech landscape. Endnote This article discusses the intricacies of fine-tuning the Llama 2 7b model leveraging a Colab notebook. We laid the foundational understanding of LLM training and the intricacies of fine-tuning, shedding light on the significance of instruction datasets. We effectively adapted the Llama 2 model in our practical section, ensuring compatibility with its intrinsic prompt templates and tailored parameters.
  • 27. 27/27 When incorporated into platforms like LangChain, these refined models emerge as potent alternatives to offerings like the OpenAI API. It’s imperative to recognize that instruction datasets stand paramount in the evolving landscape of language models. The efficacy of your model is intrinsically tied to the quality of its training data. As you embark on this journey, prioritizing high-caliber datasets becomes crucial. Navigating the complexities of models like Llama 2 may appear challenging, but the rewards are substantial with diligent application and a clear roadmap. Harnessing the prowess of these advanced LLMs for targeted tasks can enhance applications, ushering in a new era of linguistic computing. Don’t let pre-trained models limit your vision. Our extensive development experience and LLM fine-tuning expertise enable us to build robust custom LLMs tailored to businesses’ specific needs. Contact our AI experts today and harness the limitless power of LLMs!