SlideShare a Scribd company logo
HOW TO TRAIN AN OPEN­
SOURCE FOUNDATION MODEL
INTO A DOMAIN­SPECIFIC LLM?
Talk to our Consultant
 
Listen to the article
In the fast-paced world of corporate environments, technological innovations
have always been a catalyst, driving business growth with remarkable
velocity. One such technological advancement that has lately become a
talking point and brought about a paradigm shift is generative AI. Today,
 
every industry is being profoundly in몭uenced by generative AI, which has
heralded the dawn of a new era de몭ned by the widespread use of Large
Language Models (LLMs). Corporate entities across diverse sectors are
increasingly recognizing the immense potential of customized large language
models to enhance their business operations in unprecedented ways.
Businesses are progressively pivoting away from generic, universal models
due to growing needs for enhanced control, greater data privacy, and cost-
e몭cient methodologies. As organizations increasingly leverage the power of
customization, they open the doors to a world where intelligent systems
stand poised to revolutionize everything from innovation and e몭ciency to
the very core of their operational processes.
While general-purpose LLMs are trained on large and diverse datasets,
domain-speci몭c LLMs are speci몭cally designed for a particular domain and
trained on exclusive datasets that cater to an organization’s speci몭c
requirements. General-purpose LLMs can be useful for many businesses but
domain-speci몭c LLMs are more suitable for situations that require accurate
and contextually appropriate outputs within a speci몭c domain. These
models, which often outperform extensive models like GPT-3.5 that serve as
the basis for ChatGPT, are highly targeted and potentially more e몭cient.
This signi몭cant change is further propelled by the availability of numerous
open-source foundation models, making it feasible for businesses to craft
their personalized LLMs to meet their unique demands and achieve optimal
results.
Domain-speci몭c LLMs are witnessing widespread adoption as they can adapt
to speci몭c requirements, utilize available resources more e몭ciently, and
deliver powerful language processing capabilities that address speci몭c
challenges and tasks. They are poised to bring about a new era of advanced
language processing solutions with enhanced adaptability, resourcefulness,
and potency.
This article provides detailed insights into domain-speci몭c LLMs, covering
aspects like their bene몭ts, challenges involved in their development and how
to create an industry-speci몭c LLM, leveraging the potential of an open-source
foundation model. All this will help you understand the intricacies of this
fascinating progression in AI.
What are foundation models?
What are LLMs?
What are domain-speci몭c LLMs?
What are the bene몭ts of training on the domain-speci몭c LLMs?
Challenges of building domain-speci몭c custom LLMs
A detailed case study of BloombergGPT, a 몭nance-speci몭c LLM
What tasks BloombergGPT can perform?
The architecture of a domain-speci몭c LLM like BloombergGPT
How was BloombergGPT trained?
How ethical and privacy issues are handled in BloombergGPT?
How to train an open-source foundation model into a domain-speci몭c
LLM?
What are foundation models?
Foundation Models (FMs)
Large Language models (LLMs)
ex: ChatGPT, Chinchilla, GPT­3
Pretrained
Generalized
Adaptable
Large
Self­Supervised
LeewayHertz
The term “Foundation Models” was put forward by Stanford researchers to
describe a new breed of machine learning models. Rather than being
designed for a speci몭c task such as image recognition, these models are
trained on extensive, diverse datasets using self-supervised learning at scale,
allowing them to be 몭ne-tuned for various downstream tasks.
Contrary to what the name might suggest, Foundation Models (FMs) are not
the bedrock of AI, nor are they suggestive of AGI (Arti몭cial General
Intelligence). These models are unique and remarkable in their own right,
characterized by 몭ve main traits:
1. Pre-training: FMs are pre-trained with vast data and substantial computing
power, making them ready for use without further training.
2. Generalization: Unlike traditional AI models, which are task-speci몭c, FMs
are versatile, and designed to tackle numerous tasks.
3. Adaptability: FMs are adjustable through prompting, meaning they
respond to user-de몭ned inputs like text.
4. Large-scale: FMs are large in both model size and data size. For instance,
GPT-3 boasts 175 billion parameters and was trained on approximately
500,000 million words.
5. Self-supervision: FMs learn from data patterns without explicit labels.
Prominent examples of FMs include GPT-3 and DALL-E-2. These models
enable everyday users and non-developers to carry out remarkable tasks by
providing simple “prompts.”
Once trained, FMs can manage a plethora of data types and tasks, making
them highly adaptable. They can automate complex work몭ows when paired
with an appropriate operational chain. Beyond generating content such as
text, images, audio, and video, FMs can also be employed for prediction and
classi몭cation tasks, a concept known as discriminative modeling.
FMs have initiated a transformative era in knowledge work, fostering
improvements in language-related tasks, reasoning, search, and computer
vision, especially when combined with other machine learning approaches
like Reinforced Learning with Human Feedback. The development of multi-
modal FMs, such as CLIP and ViLBERT, is underway, and these models could
signi몭cantly enhance robotics. By using approaches such as few-shot
learning and domain specialization, FMs can address a multitude of
challenges.
Despite their remarkable potential, FMs do have their limitations. These
include fabricating answers (known as hallucination), issues with temporal
shifting or continual adaptation, the requirement of large data sets and
considerable computing resources for training, and the need for human
evaluation for 몭ne-tuning. Other hurdles include the tension between
specialization and diversity in FM training data, uncertainty about optimal
adaptation methods, and limited understanding of the model’s inner
workings: what a model can do, why it produces certain behaviors, and how
it accomplishes this.
Leverage LeewayHertz’s expertise in domain-
speci몭c LLMs!
We train your preferred foundation model on your
proprietary data to create a domain-speci몭c LLM that
aligns perfectly with your business needs.
Learn More
What are LLMs?
Large Language Models (LLMs) represent a subclass of Foundation Models
designed to comprehend and generate text. By ingesting immense amounts
of textual data and boasting billions of parameters, these models can
perform a variety of tasks based on a user-provided prompt. Functions we
encounter daily, such as autocomplete in search engines or Smart Compose
in email platforms, are real-world applications of these language models.
They excel at predicting the most likely subsequent word or phrase based on
the provided text.
To understand LLMs, we should recognize that they are built upon advanced
neural networks known as transformers. These transformers extract
patterns from vast amounts of text data, ranging from strings to numbers to
code. It is worth noting that the creation and utilization of LLMs involve
substantial hurdles: massive datasets, considerable computational power,
and specialized skills are prerequisites for training, 몭ne-tuning, and
extracting insights from these models. This explains why only a few models,
like BERT and GPT-3, are readily accessible and why many models remain
closed-source.
LLMs should not be con몭ated with Arti몭cial General Intelligence (AGI) despite
their potential. Current LLMs don’t inherently comprehend concepts and
abstractions. They have limitations but are still the most e몭ective tools in
Natural Language Processing (NLP) for a broad range of tasks. Their potential
applications span from coding to assisting with task completion to
composing coherent and engaging narratives.
The output of an LLM typically undergoes rigorous 몭ltering before a response
is generated. This process is integral to ensuring the generated content is
factual and safe, which is crucial considering the risks associated with these
models. As businesses aim to leverage LLMs for speci몭c products or systems,
a high degree of oversight and governance becomes essential.
The historical journey of LLMs can be traced back to the early 2010s. Initially,
transfer learning with annotated or labeled data was a prevalent practice.
However, this approach was costly, challenging to implement consistently,
and posed privacy concerns. The advent of transformer-based models
around 2017 marked a signi몭cant stride, with models like GPT-3 and ChatGPT
making their debut and gaining swift popularity.
In the post-ChatGPT world, LLMs continually show promise for various
applications. They can explain complex subjects in simple terms, translate
languages, generate creative content, brainstorm, assist with online tasks,
languages, generate creative content, brainstorm, assist with online tasks,
provide customer service, and even perform technical functions. However,
they are not without 몭aws. The reliability of answers and the risk of
fabricated responses, often referred to as “model hallucination,” highlight the
need for further re몭nement of these models.
Thus, while LLMs have transformative potential and are reshaping the
complex AI landscape, they require careful management and continual
re몭nement to ensure they show their full potential without compromising on
safety and accuracy.
LLM Stack — Simple View
Layers of Large Language Model
Inference Based on Prompts
Fine ­ Tuning
(With Labeled Data)
Filters
(Fact Checking, Safety, Policy etc.)
Raw Model Output
Training and Compute
Human
Feedback
with
Reinforcement
Learning
+
Other
Approches
LeewayHertz
What are domain­specific LLMs?
A domain-speci몭c language model constitutes a specialized subset of large
language models dedicated to producing highly accurate results within a
particular domain. Unlike a generalized LLM, which aims to process a diverse
range of topics with satisfactory pro몭ciency, a domain-speci몭c language
model focuses on optimizing its performance within a prede몭ned sector,
which could range from technical domains such as law or medicine to more
casual areas like culinary arts.
The methodology of training a domain-speci몭c language model hinges on
acquiring a substantial volume of domain-speci몭c data. This data serves as
the training set, ensuring the model is immersed in the contextual intricacies
and speci몭c knowledge pertinent to the chosen domain.
There are two primary strategies to undertake domain-speci몭c pre-training.
The 몭rst strategy involves initializing a pre-trained LLM and 몭ne-tuning it with
the domain-speci몭c data to adapt its capabilities to the new context. This
method typically allows faster convergence and often superior performance.
The alternative strategy is to construct a new LLM from scratch using the
domain-speci몭c data. The optimal strategy may vary depending on the
application and the domain in question.
However, this specialization entails a fundamental trade-o몭. A domain-
speci몭c language model, such as a model trained exclusively on culinary
recipes, can comprehend the subtleties of its specialized domain more
pro몭ciently than a generalized model. Nevertheless, its performance
deteriorates signi몭cantly when faced with tasks outside its trained domain.
On the other hand, a generic LLM possesses broader knowledge, albeit
without the 몭ne-grained expertise of a domain-speci몭c language model.
This di몭erence is similar to what we see in real-life expertise. It is exceedingly
rare to encounter a professional chef who can also provide pro몭cient stock
market analysis or a mathematician who doubles as an accomplished artist.
Correspondingly, developing an LLM that can seamlessly transition between
domains while maintaining a high degree of pro몭ciency is a challenge. The
trade-o몭 between breadth and depth of knowledge is intrinsic to the design
of language models, necessitating judicious consideration in the
development process.
Here is a table comparing general LLMs and domain-speci몭c LLMs:
Criteria
General LLMs (e.g.,
GPT-3)
Domain-speci몭c
LLMs (e.g.,
FoodUDT-1B)
Purpose
Developed to
understand and
generate text across
a broad range of
topics and contexts.
Speci몭cally designed
to understand and
generate text in a
particular niche or
domain.
Training Data
Trained on diverse
internet text,
encompassing a
wide range of topics
Trained on domain-
speci몭c data (in this
case, 500K recipes).
Knowledge
Depth
Has a general
understanding of a
wide range of topics,
including domain-
speci몭c topics at a
high level.
Has a deep
understanding of a
speci몭c domain.
Representation
of Context
Represents text and
context based on a
broad
understanding of
language and world
knowledge. For
example, “apple” is
closer to “iPhone”
than to “apple pie.”
Represents text and
context based on
deep, specialized
knowledge. For
example, “apple” is
closer to “apple pie”
than to “iPhone.”
Criteria
General LLMs (e.g.,
GPT-3)
Domain-speci몭c
LLMs (e.g.,
FoodUDT-1B)
Word
Associations
Forms associations
based on generic
context. For
example, “meat,”
“sugar,” and
“dessert” are equally
related.
Forms associations
based on domain-
speci몭c context. For
example, “dessert”
and “sugar” are
more related than
“meat” and “sugar.”
Advantages
Versatile and can
handle a wide range
of tasks and queries.
Highly accurate and
e몭cient in its
speci몭c domain due
to targeted training.
Limitations
May not possess in-
depth
understanding or
generate accurate
outputs in
specialized domains.
Limited to its
speci몭c domain and
may not generate
accurate outputs
outside that domain.
Use Cases
General
conversational AI,
summarization,
translation, and
other generic tasks.
Highly specialized
tasks within the
domain (in this case,
recipe search and
suggestions).
Leverage LeewayHertz’s expertise in domain-
speci몭c LLMs!
We train your preferred foundation model on your
proprietary data to create a domain-speci몭c LLM that
aligns perfectly with your business needs.
Learn More
It is important to remember that while domain-speci몭c LLMs can provide
better performance within their domain, general LLMs are more versatile
and can handle a wider variety of tasks. The choice between the two depends
on the speci몭c needs and objectives of the application.
What are the benefits using domain­
specific LLMs?
In the rapidly evolving 몭eld of arti몭cial intelligence, domain-speci몭c language
models have emerged as a critical tool for enhancing task pro몭ciency and
model e몭ciency. Tailored to excel in speci몭c 몭elds, these models leverage
the extensive knowledge base of large language models and 몭ne-tune it to
provide superior generalization capabilities and augmented interpretability
in specialized domains.
Here are some bene몭ts:
Enhanced task pro몭ciency
Large language models, ingrained with extensive textual data, demonstrate a
comprehensive understanding of language and an abundance of general
knowledge, serving as an excellent springboard for domain-speci몭c 몭ne-
tuning. While an LLM is suitable for generalized linguistic tasks, it may not
perform optimally in specialized domains such as healthcare. Leveraging an
LLM as a foundational model and 몭ne-tuning it for speci몭c domains can
considerably improve task-related performance.
Ampli몭ed e몭ciency
Domain-speci몭c models are highly tailored to execute a particular set of
tasks and concentrate solely on the information relevant to those tasks. This
speci몭city reduces computational overhead and processing requirements,
thereby enhancing model e몭ciency and expediting task completion.
Superior generalization capabilities
Training models to comprehend the particular language and domain-speci몭c
knowledge allows them to generalize more pro몭ciently to novel instances
within the domain. This becomes a crucial asset when dealing with limited
data sets or when the domain incorporates its own unique terminology and
concepts. This ability is especially advantageous in rapidly evolving 몭elds
such as precision medicine in healthcare.
Augmented interpretability
The interpretability of domain-speci몭c models is notably enhanced due to
their task-speci몭c focus and domain-speci몭c training data. These models
yield more relevant outcomes, regardless of the task, be it chat, text search,
prediction, or information extraction, thus making them more
comprehensible and meaningful.
Challenges of building domain­specific
custom LLMs
Creating custom large language models introduces various challenges for
organizations, broadly segmented into data-, technical-, ethical-, and
resource-related issues.
Data challenges
Organizations striving to establish custom LLMs grapple with data-associated
challenges, including data acquisition, quality, and privacy. The procurement
of substantial, niche-speci몭c data could be demanding, especially when
dealing with specialized or con몭dential data. Ensuring data integrity during
collection is paramount, and so is addressing data privacy and protection,
which requires deploying e몭ective measures to anonymize data and
safeguard it throughout training and deployment phases.
Technical challenges
Technical hurdles in the development of custom LLMs revolve around
aspects such as model architecture, training, assessment, and validation.
Selection of suitable architecture and parameters necessitates specialized
knowledge, while training custom LLMs requires advanced competencies in
machine learning. Evaluation becomes complex owing to the lack of
established benchmarks for niche-speci몭c tasks, and accuracy, safety, and
compliance validation of model responses present additional challenges.
Ethical challenges
It is essential to confront ethical issues related to bias, fairness, content
moderation, and safety while creating custom LLMs. LLMs may inadvertently
incorporate and perpetuate biases from training data, making meticulous
auditing and mitigation strategies a necessity. Robust content moderation
mechanisms must be in place to prevent potentially inappropriate or harmful
content generated by custom LLMs.
Resource challenges
The development of custom LLMs poses resource-related issues,
predominantly in terms of computational resources and specialized skills.
Training LLMs necessitate substantial computational resources, which could
be expensive and inaccessible for some organizations. It also requires a team
pro몭cient in machine learning, natural language processing, and software
engineering, which could be challenging to recruit and retain, thereby
increasing the complexity and cost of the project.
Despite the inherent challenges, they are not unconquerable. Organizations
can successfully develop and deploy custom LLMs with strategic planning,
apt resources, and expert guidance to meet their unique requirements. As
the market begins to witness the advent of commercially viable open-source
foundation models, the trend of building domain-speci몭c LLMs using these
models is set to intensify.
models is set to intensify.
In the following sections we will describe about the famous BloombergGPT
model. By providing an overview of the BloombergGPT model, we help you
understand the process, techniques, and strategies to be employed while
developing your own domain-speci몭c LLM using a foundation model.
Leverage LeewayHertz’s expertise in domain-
speci몭c LLMs!
We train your preferred foundation model on your
proprietary data to create a domain-speci몭c LLM that
aligns perfectly with your business needs.
Learn More
A detailed case study of BloombergGPT, a
finance­specific LLM
Bloomberg has unveiled BloombergGPT, a state-of-the-art large language
model, primed with extensive 몭nancial data for superior performance in
몭nancial sector-speci몭c NLP tasks. This powerful AI tool expedites the
evaluation of 몭nancial data, assists in risk assessments, measures 몭nancial
sentiment, and could potentially automate accounting and auditing
functions.
BloombergGPT is a highly specialized language model developed for the
몭nancial domain. It’s a sophisticated construct, equipped with 50 billion
parameters. The model’s training utilized a considerable volume of data –
363 billion tokens sourced from Bloomberg’s unique databases and an
additional 345 billion tokens derived from generalized datasets. This
combination of 몭nancial and general-purpose data has resulted in a model
that performs exceptionally well in its specialized domain. BloombergGPT
has demonstrated substantial improvements over existing models when
tackling 몭nancial tasks. Notably, despite its specialization, the model sustains
commendable performance levels on standard large language model
benchmarks, asserting its versatile applicability.
According to Bloomberg, the 몭nancial industry’s intricate nature necessitates
an AI speci몭cally trained on 몭nancial data. In this context, BloombergGPT has
been designed to interact with the Bloomberg Terminal, a software platform
that provides real-time market data, breaking news, in-depth 몭nancial
research, and sophisticated analytics to 몭nancial professionals.
BloombergGPT is a pioneering stride towards leveraging this novel
technology for the 몭nancial sector. It is projected to enhance Bloomberg’s
existing NLP functionalities, including sentiment analysis, named entity
recognition, news classi몭cation, and question-answering. Moreover, the
model’s capability to structure the vast data accessible on the Bloomberg
Terminal can signi몭cantly improve client services and unlock AI’s immense
potential within the 몭nancial industry.
Bloomberg’s research emphasizes that while general models eliminate the
need for specialized training due to their wide-ranging capabilities, existing
domain-speci몭c models’ results suggest that they are irreplaceable. Despite
the 몭nancial domain being Bloomberg’s primary focus, the company
supports diverse tasks that bene몭t from a general model. Hence, Bloomberg
aimed to construct a model like BloombergGPT, which performs
commendably on general-purpose LLM benchmarks, while also excelling in
몭nancial tasks.
BloombergGPT stands as a potent Large Language Model (LLM), speci몭cally
optimized for 몭nancial Natural Language Processing (NLP) tasks. This
optimization is achieved by integrating both domain-speci몭c and general-
purpose data during its training phase.
Bloomberg Query Language (BQL) functions as a sophisticated query
language, enabling access and analysis of 몭nancial data within Bloomberg’s
platform. BQL, although a powerful tool, presents complexities and can be
applied to diverse tasks including data search, data analysis, report
generation, and insight derivation. BloombergGPT streamlines this process
by converting natural language queries into valid BQL, thereby facilitating
more intuitive interactions with 몭nancial data.
Additionally, BloombergGPT holds potential applications for news settings,
serving as a tool for suggesting news headlines. This functionality proves
valuable for news applications and assists journalists in curating newsletters.
The model accepts paragraphs as inputs and proposes an appropriate title in
return. This capability enhances the ease and speed of news article creation,
contributing further to BloombergGPT’s utility in the areas of 몭nance and
news.
BloombergGPT has demonstrated a competitive performance compared to
GPT-3 and other LLMs on general tasks, outpacing them in several 몭nance-
speci몭c tasks. This introduction of BloombergGPT represents a signi몭cant
breakthrough in AI-driven 몭nancial analysis, amplifying the application scope
of NLP within the 몭nancial technology landscape.
What tasks BloombergGPT can perform?
Financial tasks in natural language processing
When BloombergGPT approaches 몭nancial tasks within the NLP discipline, it
handles them similarly to general NLP tasks. However, these 몭nancial tasks
carry unique features and challenges that di몭erentiate them from their
general counterparts. A prime example of this dynamic can be seen in
sentiment analysis.
Sentiment analysis is a common NLP task, which aims to identify the tone or
emotional context of a given text. Take, for instance, a headline such as
“COMPANY to cut 10,000 jobs”. In general, this would typically suggest a
negative sentiment as it implies job loss and potential hardship for many
individuals. However, when seen through the lens of 몭nancial context, the
sentiment can be perceived di몭erently.
In the world of 몭nance, job cuts often indicate a move towards operational
e몭ciency by a company. Such a move can increase the company’s stock price
or bolster investor con몭dence in the company. This clearly demonstrates the
multifaceted nature of sentiment analysis when applied within the 몭nancial
domain, as understood and implemented by BloombergGPT.
External 몭nancial benchmarks
When it comes to external 몭nancial benchmarks, BloombergGPT utilizes a
variety of resources. This includes four tasks drawn from the FLUE
benchmark as well as the ConvFinQA dataset. The performance of large
language models on these 몭nancial tasks is a relatively unexplored territory,
making the assessment complex.
BloombergGPT operates in a unique fashion, utilizing a few-shot learning
approach when dealing with these tasks. This technique involves training the
model on a small number of examples or ‘shots’. The purpose behind this
method is to optimize the average performance across all models.
This approach demonstrates how BloombergGPT addresses the challenges
of evaluating performance on tasks where there isn’t a broad consensus or
standard testing framework. Despite the intricate nature of these tasks,
BloombergGPT remains committed to performing e몭ciently, abiding by its
main principle of choosing a number of shots that would maximize average
performance.
Internal task: Sentiment analysis
When dealing with internal tasks, BloombergGPT focuses heavily on aspect-
speci몭c sentiment analysis, a practice common in 몭nancial literature. The
process involves a methodical approach, structured around a distinct
discovery phase.
The discovery phase is instrumental in setting up the groundwork for
sentiment analysis. It establishes the annotation and sampling procedures,
which are essential components of the process. Additionally, this phase helps
to gauge the required number of annotators for each example.
The training requirements for the annotators are also determined during this
discovery phase. BloombergGPT relies on this structured procedure to
maintain a consistent and e몭ective approach to sentiment analysis, ensuring
accurate and relevant results. This method underscores BloombergGPT’s
commitment to perform complex tasks with diligence and precision.
Exploratory task: Named Entity Recognition (NER)
Named Entity Recognition (NER) is a well-established task in the broader
scope of natural language processing. However, this area has not been
deeply investigated when it comes to generative language learning models.
Recognizing the relevance of NER in the 몭nancial sector, BloombergGPT
embarked on an exploratory journey into this task, aiming to uncover new
insights and approaches.
The team behind BloombergGPT reported preliminary results from their
exploration into NER. As part of this endeavor, they utilized seven distinct
NER datasets originating from various domains within Bloomberg’s internal
resources.
By venturing into this relatively uncharted territory within generative LLMs,
BloombergGPT is enhancing its capabilities and contributing valuable
research and 몭ndings to the broader 몭eld of natural language processing.
Evaluation on standard general-purpose NLP tasks
The performance of BloombergGPT was critically examined on a variety of
standard general-purpose NLP tasks. This evaluation process utilized BIG-
bench Hard, representing a subset of the most demanding tasks within the
broader BIG-bench benchmark.
Although the main focus of BloombergGPT is distinctly 몭nance-oriented
tasks, the inclusion of general-purpose training data in its learning process
was considered essential. This approach is based on the possibility that such
comprehensive data might augment BloombergGPT’s e몭cacy. This could not
only enhance its capabilities in dealing with 몭nancial tasks, but could
potentially improve its performance on conventional NLP tasks as well.
Therefore, even as BloombergGPT is fundamentally 몭nance-focused, it
appreciates the value of a well-rounded NLP competence, actively seeking to
excel in not just specialized, but also universal NLP tasks.
Knowledge assessments
BloombergGPT’s knowledge capacity, speci몭cally its capability to recall
information learned during its training phase, was put under the microscope
by the research team. This evaluation was conducted through various
scenarios where the model was required to provide answers to posed
questions, without the aid of additional context or resources.
These scenarios comprised multiple-choice questions, along with
benchmarks based on reading comprehension. Thus, the assessment was
designed to evaluate not just the factual knowledge retained by
BloombergGPT during its training, but also its ability to interpret, understand
and utilize that knowledge e몭ectively. It provided a valuable insight into
BloombergGPT’s understanding and recall capabilities, which are critical
aspects for any language model, especially in the 몭nancial domain where
accuracy and reliability of information is paramount.
Linguistic tasks
In the evaluation of BloombergGPT, certain tasks were not directly linked to
end-user applications. These were essentially linguistic tasks, put in place to
measure the model’s pro몭ciency in understanding language at its core.
Such tasks scrutinized BloombergGPT’s capability in tackling disambiguation,
parsing grammar, and understanding entailment. These areas are
fundamental to language understanding and were chosen to directly assess
BloombergGPT’s linguistic competence. By examining these abilities, the
team aimed to ensure that the model could adeptly navigate language
complexities, a skill that would prove invaluable even in its primary focus
area of 몭nancial language processing.
The architecture of a domain-speci몭c LLM like
BloombergGPT
The creation of BloombergGPT was a joint endeavor between the Machine
Learning (ML) Product and Research group and the AI Engineering team at
Bloomberg. Their objective was to compile one of the largest domain-speci몭c
datasets to date. This involved leveraging Bloomberg’s vast data creation,
collection, and curation resources. The team utilized Bloomberg’s expansive
archive of 몭nancial data, assembling a comprehensive dataset of 363 billion
tokens composed of English 몭nancial documents. This corpus was further
augmented with a public dataset of 345 billion tokens, resulting in a
combined training corpus surpassing 700 billion tokens.
To operationalize this vast dataset, the team elected to train a 50-billion
parameter decoder-only causal language model. This type of model is adept
at generating text in a sequential manner, making it ideal for processing
natural language data. The resulting model was then tested on several
validation sets: 몭nance-speci몭c natural language processing benchmarks,
Bloomberg’s internal benchmarks, and a variety of generic NLP tasks drawn
from widely recognized benchmarks.
BloombergGPT demonstrated exceptional performance, surpassing existing
open source models of comparable size on 몭nance-speci몭c tasks by
substantial margins. Importantly, while its specialization didn’t impede its
ability to tackle broader tasks, the model also matched or exceeded
performance on the general NLP benchmarks. This underscores the potential
of BloombergGPT to enhance 몭nancial NLP tasks without sacri몭cing its utility
for more general applications.
BloombergGPT’s architecture is based on a language model called BLOOM,
which operates as a causal or “decoder-based” model. “Causal” essentially
means that the model generates text sequentially, predicting the next word
in a sequence based on the words that came before it.
The core of the BLOOM model consists of 70 layers of transformer decoder
blocks. Here’s a simpli몭ed explanation of what this means:
Transformer decoder blocks: These are a type of model structure used in
natural language processing. They are designed to handle sequential data,
like sentences, where the order of the elements is important. The
transformer decoder block uses the following components:
SA (Self attention): This mechanism helps the model pay varying
amounts of ‘attention’ to di몭erent words in the input when generating
the output. It allows the model to focus on the important words and
understand the context better.
LN (Layer Normalization): This is a technique to stabilize and speed up
the learning process of the model. It ensures that the calculations the
model performs stay within a manageable range, preventing numerical
issues.
FFN (Feed-forward Network): This is a type of arti몭cial neural network
where information moves in one direction – from input to output. It
does not have any cycles or loops, and each layer of neurons is fully
connected to the next layer.
The model also ties the input token embeddings (the numerical
representation of words) to the linear mapping that occurs before the 몭nal
output is decided. This step helps reduce the model’s complexity and can
improve the e몭ciency of the model’s learning process.
Finally, there is an extra layer normalization step added after token
embeddings. In this case, the initial token embedding, and the additional
component of layer normalization for the embeddings. The inclusion of two
consecutive layer normalization steps ensures the model’s stability and
assists in e몭cient learning.
In summary, the architecture of the BloombergGPT model is designed to
e몭ciently learn from and generate text data, with several features intended
to enhance the model’s stability and learning speed.
Leverage LeewayHertz’s expertise in domain-
speci몭c LLMs!
We train your preferred foundation model on your
proprietary data to create a domain-speci몭c LLM that
aligns perfectly with your business needs.
Learn More
How was BloombergGPT trained?
Step 1: Constructing the FinPile dataset
The research team responsible for the creation of BloombergGPT made use
of an extensive and diverse dataset titled “FinPile” for its training process.
FinPile comprises various English 몭nancial documents from multiple sources,
namely news articles, 몭lings, press releases, and social media content, all
from the extensive Bloomberg archives.
This wealth of 몭nancial data includes elements like company 몭lings, 몭nancial
news, and other information relevant to market trends. Some of this data is
publicly accessible, while other parts can only be accessed through purchase
or exclusive access via the Bloomberg Terminal. This data is meticulously
cleaned to remove any unnecessary elements such as markup, special
formatting, and templates. Moreover, each document is time-stamped,
spanning from March 1, 2007, to July 31, 2022, indicating a gradual
improvement in data quality and volume over time.
While the team can’t release the FinPile dataset, they aim to share their
insights and experiences in building this highly domain-speci몭c model. They
also intend to leverage date information in future research work.
FinPile’s 몭nancial data is paired with public data commonly used for LLM
training to create a balanced training corpus, having an equal representation
of domain-speci몭c and general-purpose text. The team undertook a de-
duplication process for each dataset in order to maintain high data quality,
which could result in di몭ering statistical reports compared to other studies.
Step 2: Tokenization
The research team behind BloombergGPT decided to employ the Unigram
tokenizer, in preference to other greedy merge-based sub-word tokenizers
like BPE or Wordpiece. This decision was based on the Unigram tokenizer’s
promising performance in prior research studies. They adopted a byte-based
approach to data handling instead of the more common practice of treating
data as a sequence of Unicode characters.
Additionally, the team employed a pre-tokenization process similar to that
used in GPT-2, which breaks the input into distinct chunks. This method
facilitates the learning of multi-word tokens, thereby increasing information
density and reducing context lengths.
The team used the Pile dataset to train their tokenizer. Given its diverse
content, The Pile was considered a suitable choice for their purposes.
Handling such a large dataset required a parallel tokenizer training method,
and the team adopted a split and merge strategy for this. They trained a
Unigram tokenizer on each chunk of data and then merged them in a
hierarchical manner to create the 몭nal tokenizer.
The resulting tokenizer consisted of 7 million tokens, which was further
condensed to 131,072 tokens. The decision of vocabulary size was in몭uenced
by several factors, including the need to 몭t more information into the context
window and considerations related to overhead. The team arrived at a
vocabulary size of 131,072 tokens following experiments with di몭erent
vocabulary sizes and the smallest encoded representation of the C4 dataset.
Consequently, the tokenizer used for BloombergGPT is relatively larger than
the standard vocabulary size of approximately 50,000 tokens.
Step 3: Building the model
The Bloomberg team builds the BloombergGPT model upon the structural
foundation provided by the BLOOM model. This involves utilizing a decoder-
only causal language model comprising 70 layers of transformer decoder
blocks. These blocks consist of multi-head self-attention, layer normalization,
and a feed-forward network with a single hidden layer. A GELU function is
employed as the non-linear element within the network.
Additionally, ALiBi positional encoding, or Attention with Linear Biases, is
implemented and the input token embeddings are connected to the linear
mapping preceding the 몭nal softmax application. An additional layer
normalization is executed following the token embeddings.
Leveraging the scaling laws from Chinchilla, the Bloomberg team determines
the size of the BloombergGPT model. They set their total compute budget at
1.3 million GPU hours on 40GB A100 GPUs. The costs associated with
activation checkpointing are accounted for and the Chinchilla equations are
used to ascertain the optimum balance between the number of parameters
and tokens.
Despite possessing a signi몭cant dataset of approximately 700 billion tokens,
the Bloomberg team found it too limited for the optimal con몭guration given
their compute budget. To circumnavigate this constraint, they opt for the
largest possible model, having 50 billion parameters, while preserving
around 30% of the total compute budget as a safeguard for potential
challenges that may emerge.
To set the structure of the BloombergGPT model, which has 50 billion
parameters, the Bloomberg team used recommendations from Levine et al.
(2020). According to these recommendations, the ideal size (D) for the hidden
layers depends on the model’s number of self-attention layers (L). The team
chose 70 self-attention layers (L = 70) and a target size of 7510 for the hidden
layers (D = 7510).
To enhance the performance of the model’s Tensor Core operations, the
team used a design with 40 heads, each having a dimension of 192. This
resulted in a total hidden size (D) of 7680, bringing the overall parameter
count to 50.6 billion.
Step 4: Training the model
The BloombergGPT model, built on PyTorch, is trained following a standard
causal language modeling objective, with a directionality from left to right.
The team has strategically set the length of the training sequences at 2,048
tokens to make optimal use of GPU resources.
The AdamW optimizer is utilized in the optimization process, and the
learning rate aligns with a cosine decay schedule that includes a linear
warmup. Additionally, a batch size warmup is applied for enhanced
performance.
The training and evaluation stages of the model are conducted on Amazon
SageMaker, leveraging 64 instances of p4d.24xlarge, each equipped with 8
NVIDIA 40GB A100 GPUs. To ensure e몭cient data access and high-speed
throughput, the team employs Amazon FSX for Lustre, which supports up to
1000 MB/s for both read and write operations per TiB of the storage unit.
Step 5: Large-scale optimization
To train the substantial BloombergGPT model within the constraints of GPU
memory, the Bloomberg team implements a series of optimization
techniques:
ZeRO optimization (stage 3): This approach involves dividing the training
state across a group of GPUs. In the case of BloombergGPT, the team
deploys 128 GPUs for the purpose of sharding and maintains four replicas
of the model throughout the training process.
MiCS: This system is designed to lower the communication overhead and
memory requirements for training clusters in the cloud. It utilizes
hierarchical communication, a 2-hop gradient update mechanism, and a
scale-aware model partitioning scheme.
Activation checkpointing: This technique discards activations and
recalculates intermediate tensors during backward passes, as and when
necessary to minimize the memory consumed during the training process.
Mixed precision training: This strategy helps save memory by executing
forward and backward passes in BFloat16 precision, while storing and
updating the parameters in full precision (FP32).
Fused kernels: This method amalgamates multiple operations into a
singular GPU operation. The result is a reduction in peak memory usage
due to the avoidance of storage of intermediate results and an
enhancement in speed.
By applying these optimizations, the Bloomberg team successfully manages
to yield an average computational performance of 102 TFLOPs, with each
training step taking around 32.5 seconds.
How ethical and privacy issues are handled in
BloombergGPT?
BloombergGPT, designed for 몭nancial applications, is developed with a
thorough focus on ethical and privacy considerations. These aspects are
meticulously addressed in the development process through a series of well-
structured measures.
Risk assessment and compliance procedures
Recognizing the sensitivity of the 몭nancial sector, Bloomberg has established
a robust risk assessment and testing process. Accuracy and factual
information are deemed crucial for the 몭rm’s reputation and the satisfaction
of its clients, who are increasingly seeking to incorporate advanced
technology into their operations.
This process involves stringent annotation guidelines, multi-level pre-launch
reviews involving central risk and compliance units, and continuous post-
launch monitoring. All research, development, and deployment procedures
for BloombergGPT align with relevant regulations, maintaining high
standards for compliance.
Toxicity and bias control
Bloomberg exerts extraordinary e몭ort to control any potential toxicity and
bias in its content, produced either by humans or AI models. The team
behind BloombergGPT maintains an ongoing interest in evaluating and
quantifying potential generation of harmful language in various application
contexts. Particular attention is given to studying the in몭uence of the FinPile
corpus, which is characterized by a lower incidence of biased or toxic
language, on the model’s tendencies. As BloombergGPT is integrated into
new products, these rigorous testing procedures and compliance controls
will be applied to ensure safe usage.
Model release and openness
The question of how to release LLMs is a subject of ongoing debate in the AI
community. Given the potential for misuse of LLMs, especially ones like
BloombergGPT that are trained on a wealth of sensitive data, Bloomberg
employs a strategic approach to this issue. Various strategies are considered,
ranging from releasing trained models under speci몭c usage licenses, to
granting API access without revealing underlying model parameters or
detailed training data.
In light of the signi몭cant risk of data leakage attacks, Bloomberg has opted
for a conservative approach, withholding the release of BloombergGPT
model parameters. This decision is made in alignment with the company’s
mission to provide access to data collected over decades, while also ensuring
its protection from potential misuse.
Contribution to the 몭eld
Despite not releasing the model, BloombergGPT’s development journey
provides valuable insights to the broader NLP community, especially those
creating domain-speci몭c models. The experience gained in training and
evaluating BloombergGPT can help better understand these models,
contributing to the shared knowledge in the 몭eld of NLP and LLMs.
Bloomberg acknowledges the in몭uence of various other models and projects
in shaping the development of BloombergGPT, and continues its e몭orts in
contributing to this collective intelligence.
How to train an open­source foundation
model into a domain­specific LLM?
The objective here is to develop a classi몭cation model leveraging the robust
capabilities of BERT, a pre-trained large language model. To customize this
model to 몭t a speci몭c domain, it will be 몭ne-tuned using domain-speci몭c
data, speci몭cally the Toxic Comment Classi몭cation Challenge data, sourced
from the realm of social media.
The dataset essential for executing this project is available on Kaggle. Only
the ‘train.csv’ 몭le among the available datasets will be utilized for this
particular case. The complete code is available on GitHub.
Let’s code!
Step 1: Loading the data set
import numpy as np
import pandas as pd
import torch
import torch.nn as nn
from sklearn.model_selection import train_test_split
from sklearn.metrics import classification_report
import transformers
from transformers import AutoModel, BertTokenizerFast
# specify GPU if you have a GPU
device = torch.device("cuda")
# ­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­
# Lead dataset
df = pd.read_csv("train.csv")
# mydata.csv has only 2 columns "text" and "label" (binary)
df = df.sample(6400)[['comment_text', 'toxic']]
df.columns = ['text', 'label']
# print data and take a look at it
df.head()
# check for class ditribution
print(df['label'].value_counts(normalize=True))
# Split data into train and test
train_text, temp_text, train_labels, temp_labels = train_test_split(
df['text'],
df['label'],
random_state=1234,
test_size=0.3,
stratify=df['label'])
# Using temp_text and temp_labels created in previous step generate va
val_text, test_text, val_labels, test_labels = train_test_split(
val_text, test_text, val_labels, test_labels = train_test_split(
temp_text,
temp_labels,
random_state=1234,
test_size=0.5,
stratify=temp_labels)
# This will make 70% training data, 15% validation, and 15% test
Step 2: Tokenizaton
# Import BERT tokenizer from the pretained model
bert = AutoModel.from_pretrained('bert­base­uncased')
tokenizer = BertTokenizerFast.from_pretrained('bert­base­uncased')
# Get length of all the sentences in the data
seq_len = [len(i.split()) for i in train_text]
pd.Series(seq_len).hist(bins=100)
# Select the max_seq_len that we will be keeping based on the distribu
max_seq_len = 100
# Tokenize sequences trimmed at max length determined above in the tra
# tokenize and encode sequences in the
tokens_train = tokenizer.batch_encode_plus(train_text.tolist(),
max_length=max_seq_len,
pad_to_max_length=True,
truncation=True,
return_token_type_ids=False)
tokens_val = tokenizer.batch_encode_plus(val_text.tolist(),
max_length=max_seq_len,
pad_to_max_length=True,
truncation=True,
return_token_type_ids=False)
tokens_test = tokenizer.batch_encode_plus(test_text.tolist(),
max_length=max_seq_len,
pad_to_max_length=True,
truncation=True,
return_token_type_ids=False)
# Convert tokens to tensors for train, validation, and test sts
train_seq = torch.tensor(tokens_train['input_ids'])
train_mask = torch.tensor(tokens_train['attention_mask'])
train_y = torch.tensor(train_labels.tolist())
val_seq = torch.tensor(tokens_val['input_ids'])
val_mask = torch.tensor(tokens_val['attention_mask'])
val_y = torch.tensor(val_labels.tolist())
test_seq = torch.tensor(tokens_test['input_ids'])
test_mask = torch.tensor(tokens_test['attention_mask'])
test_y = torch.tensor(test_labels.tolist())
Step 3: Creating a data loader to prepare the data for training
from torch.utils.data import TensorDataset, DataLoader, RandomSampler,
#define a batch size (You may want to tweak batch size based on your s
batch_size = 32
# wrap tensors, random smapler, and prep dataloader for train data
train_data = TensorDataset(train_seq, train_mask, train_y)
train_sampler = RandomSampler(train_data)
train_dataloader = DataLoader(train_data,
sampler=train_sampler,
batch_size=batch_size)
# wrap tensors, random smapler, and prep dataloader for vaidation data
val_data = TensorDataset(val_seq, val_mask, val_y)
val_sampler = SequentialSampler(val_data)
val_dataloader = DataLoader(val_data,
sampler=val_sampler,
batch_size=batch_size)
# freeze all the parameters by setting requires_grad = False
for param in bert.parameters():
param.requires_grad = False
Step 4: De몭ning the model architecture: Adding 3 layers and 1 binary
output layer to the NN
class BERT_Arch(nn.Module):
def __init__(self, bert):
super(BERT_Arch, self).__init__()
self.bert = bert
# dropout layer
self.dropout = nn.Dropout(0.1)
# relu activation function
self.relu = nn.ReLU()
# dense layer 1 # Note 768 is immutable for BERT, you can choose a dim
self.fc1 = nn.Linear(768, 512)
self.fc1 = nn.Linear(768, 512)
# dense layer 2 (Output layer)
self.fc2 = nn.Linear(512, 128)
# dense layer 3 (Output layer)
self.fc3 = nn.Linear(128, 32)
# dense layer 4 (Output layer)
self.fc4 = nn.Linear(32, 2)
#softmax activation function
self.softmax = nn.LogSoftmax(dim=1)
#define the forward pass
def forward(self, sent_id, mask):
#pass the inputs to the model
_, cls_hs = self.bert(sent_id, attention_mask=mask, return_dict=False)
# Execute 1st layer
x = self.fc1(cls_hs)
x = self.relu(x)
x = self.dropout(x)
# Execute 2nd layer
x = self.fc2(x)
x = self.relu(x)
x = self.dropout(x)
# Execute 3rd layer
x = self.fc3(x)
x = self.relu(x)
x = self.dropout(x)
# output layer
x = self.fc4(x)
# apply softmax activation
x = self.softmax(x)
return x
Step 5: Passing the pre-trained BERT to prepare the loss function
model = BERT_Architecture(bert)
# push the model to GPU
model = model.to(device)
# optimizer from hugging face transformers
from transformers import AdamW
# define the optimizer
optimizer = AdamW(model.parameters(), lr=1e­3)
# Find class weights and push those to GPU
from sklearn.utils.class_weight import compute_class_weight
class_weights = compute_class_weight(class_weight="balanced",
classes=np.unique(train_labels),
y=train_labels)
# class_weights = dict(zip(np.unique(train_labels), class_weights))
# convert class weights to tensor
weights = torch.tensor(class_weights, dtype=torch.float)
weights = weights.to(device)
cross_entropy = nn.NLLLoss(weight=weights)
# number of training epochs
epochs = 100
print(class_weights)
Step 6: Train the model
def train():
# Train model
model.train()
# Initiate the loss and accuracy as zero to start with (it will update
total_loss, total_accuracy = 0, 0
# empty list to save model predictions
total_preds = []
# iterate over batches
for step, batch in enumerate(train_dataloader):
# push the batch to gpu
batch = [r.to(device) for r in batch]
sent_id, mask, labels = batch
# clear previously calculated gradients
model.zero_grad()
# get model predictions for the current batch
preds = model(sent_id, mask)
# compute the loss between actual and predicted values
loss = cross_entropy(preds, labels)
# add on to the total loss
total_loss = total_loss + loss.item()
# backward pass to calculate the gradients
loss.backward()
# clip the the gradients to 1.0. It helps in preventing the exploding
torch.nn.utils.clip_grad_norm_(model.parameters(), 1.0)
# update parameters
optimizer.step()
# model predictions are stored on GPU. So, push it to CPU
preds = preds.detach().cpu().numpy()
# append the model predictions
total_preds.append(preds)
# compute the training loss of the epoch
avg_loss = total_loss / len(train_dataloader)
# predictions are in the form of (no. of batches, size of batch, no. o
# We will have to reshape the predictions in form of (number of sample
total_preds = np.concatenate(total_preds, axis=0)
#returns the loss and predictions
return avg_loss, total_preds
Step 7: Evaluate the model
def evaluate():
print("nEvaluating...")
# deactivate dropout layers
model.eval()
# Again starting loss and accuracy as zero
total_loss, total_accuracy = 0, 0
# empty list to save the model predictions
total_preds = []
# iterate over batches
for step, batch in enumerate(val_dataloader):
# push the batch to gpu
batch = [t.to(device) for t in batch]
sent_id, mask, labels = batch
# deactivate autograd
with torch.no_grad():
# model predictions
preds = model(sent_id, mask)
# compute the validation loss between actual and predicted values
loss = cross_entropy(preds, labels)
total_loss = total_loss + loss.item()
preds = preds.detach().cpu().numpy()
total_preds.append(preds)
# compute the validation loss of the epoch
avg_loss = total_loss / len(val_dataloader)
# reshape the predictions in form of (number of samples, no. of classe
total_preds = np.concatenate(total_preds, axis=0)
return avg_loss, total_preds
Step 8: Run the pipeline to iteratively train and evaluate model and
get the output for every epoch
# set initial loss to infinite
best_valid_loss = float('inf')
# empty lists to store training and validation loss of each epoch
train_losses = []
valid_losses = []
#for each epoch
for epoch in range(epochs):
#train model
train_loss, _ = train()
#evaluate model
valid_loss, _ = evaluate()
#save the best model
if valid_loss < best_valid_loss:
best_valid_loss = valid_loss
torch.save(model.state_dict(), 'saved_weights.pt')
# append training and validation loss
train_losses.append(train_loss)
valid_losses.append(valid_loss)
print(
f'Epoch {epoch + 1}/{epochs}tTraining Loss: {train_loss:.3f}tValidat
)
#load weights of best model
path = 'saved_weights.pt'
model.load_state_dict(torch.load(path))
# get predictions for test data
with torch.no_grad():
preds = model(test_seq.to(device), test_mask.to(device))
preds = preds.detach().cpu().numpy()
# model's performance
preds = np.argmax(preds, axis=1)
print(classification_report(test_y, preds))
pd.crosstab(test_y, preds)
Endnote
As we conclude, it’s clear that domain-speci몭c large language models are not
just bene몭cial, but indeed necessary for modern organizations. The shift
from generic models to more tailored language processing models addresses
unique industry needs, enhances performance, enables personalized
interactions, and streamlines operations. Moreover, these custom models
provide improved data privacy control and 몭nancial e몭ciency.
In the rapidly evolving world of AI and natural language processing, adopting
domain-speci몭c LLMs represents a crucial step towards staying competitive
and innovative. These models, 몭ne-tuned to understand speci몭c industry
language and context, transform how organizations interact with data and
digital interfaces. The need for domain-speci몭c LLMs is undeniable, and it is
now a question of how quickly organizations can adapt and integrate this
powerful tool into their processes to unlock its full potential.
Leverage the power of customized LLMs! Contact LeewayHertz to create your own
domain-speci몭c LLM for highly accurate results and unparalleled performance
within your domain.
Author’s Bio
Akash Takyar
CEO LeewayHertz
Akash Takyar is the founder and CEO at LeewayHertz. The experience of
building over 100+ platforms for startups and enterprises allows Akash to
rapidly architect and design solutions that are scalable and beautiful.
Akash's ability to build enterprise-grade technology solutions has attracted
over 30 Fortune 500 companies, including Siemens, 3M, P&G and Hershey’s.
Akash is an early adopter of new technology, a passionate technology
enthusiast, and an investor in AI and IoT startups.
Write to Akash
Start a conversation by filling the form
Once you let us know your requirement, our technical expert will schedule a
call and discuss your idea in detail post sign of an NDA.
All information will be kept con몭dential.
Name Phone
Company Email
Tell us about your project
Send me the signed Non-Disclosure Agreement (NDA )
Start a conversation
Insights
How to use LLMs in synthesizing training data?
Harnessing the power of large language models (LLMs), a mighty tool
capable of understanding, generating, and even re몭ning human-like text we
can generate synthesized training data that is 몭awless and train our models
more e몭ciently.
Read More
Prompt LLM Completion
‘ Jack has a cat.
What animal is
Jack’s pet?’
‘cat’
Build an LLM­powered application using
LangChain: A comprehensive step­by­step guide
LangChain is a framework that provides a set of tools, components, and
interfaces for developing LLM-powered applications.
How to build a private LLM?
Language models are the backbone of natural language processing (NLP)
and have changed how we interact with language and technology.
Read More
User
Prompt
Model
Output
LLM
Training
Data
Read More
Show all Insights
LEEWAYHERTZPORTFOLIO
SERVICES GENERATIVE AI
INDUSTRIES PRODUCTS
CONTACT US
Get In Touch
415-301-2880
info@leewayhertz.com
388 Market Street
Suite 1300
San Francisco, California 94111
About Us
Global AI Club
Careers
Case Studies
Work
Community
TraceRx
ESPN
Filecoin
Lottery of People
World Poker Tour
Chrysallis.AI
Generative AI
Arti몭cial Intelligence & ML
Web3
Blockchain
Software Development
Hire Developers
Generative AI Development
Generative AI Consulting
Generative AI Integration
LLM Development
Prompt Engineering
ChatGPT Developers
Consumer Electronics
Financial Markets
Healthcare
Logistics
Manufacturing
Startup
Whitelabel Crypto Wallet
Whitelabel Blockchain Explorer
Whitelabel Crypto Exchange
Whitelabel Enterprise Crypto Wallet
Whitelabel DAO
 
Privacy & Cookies Policy
info@leewayhertz.com
jobs@leewayhertz.com
Sitemap
©2023 LeewayHertz. All Rights Reserved.

More Related Content

Similar to Train foundation model for domain-specific language model

A comprehensive guide to prompt engineering.pdf
A comprehensive guide to prompt engineering.pdfA comprehensive guide to prompt engineering.pdf
A comprehensive guide to prompt engineering.pdf
StephenAmell4
 
What Are Foundation Models.pdf
What Are Foundation Models.pdfWhat Are Foundation Models.pdf
What Are Foundation Models.pdf
Ciente
 
Navigating the Large Language Model choices_Ravi Daparthi
Navigating the Large Language Model choices_Ravi DaparthiNavigating the Large Language Model choices_Ravi Daparthi
Navigating the Large Language Model choices_Ravi Daparthi
RaviKumarDaparthi
 
Northbay_December_2023_LLM_Reporting.pdf
Northbay_December_2023_LLM_Reporting.pdfNorthbay_December_2023_LLM_Reporting.pdf
Northbay_December_2023_LLM_Reporting.pdf
ssusera5352a2
 
Conversational AI Transforming human-machine interaction.pdf
Conversational AI Transforming human-machine interaction.pdfConversational AI Transforming human-machine interaction.pdf
Conversational AI Transforming human-machine interaction.pdf
JamieDornan2
 
Automate your Job and Business with ChatGPT #3 - Fundamentals of LLM/GPT
Automate your Job and Business with ChatGPT #3 - Fundamentals of LLM/GPTAutomate your Job and Business with ChatGPT #3 - Fundamentals of LLM/GPT
Automate your Job and Business with ChatGPT #3 - Fundamentals of LLM/GPT
Anant Corporation
 
How to Enhance NLP’s Accuracy with Large Language Models_ A Comprehensive Gui...
How to Enhance NLP’s Accuracy with Large Language Models_ A Comprehensive Gui...How to Enhance NLP’s Accuracy with Large Language Models_ A Comprehensive Gui...
How to Enhance NLP’s Accuracy with Large Language Models_ A Comprehensive Gui...
Nexgits Private Limited
 
Unlocking the Power of Generative AI An Executive's Guide.pdf
Unlocking the Power of Generative AI An Executive's Guide.pdfUnlocking the Power of Generative AI An Executive's Guide.pdf
Unlocking the Power of Generative AI An Executive's Guide.pdf
PremNaraindas1
 
Managing-the-Risks-of-LLMs-in-FS-Industry-Roundtable-TruEra-QuantU.pdf
Managing-the-Risks-of-LLMs-in-FS-Industry-Roundtable-TruEra-QuantU.pdfManaging-the-Risks-of-LLMs-in-FS-Industry-Roundtable-TruEra-QuantU.pdf
Managing-the-Risks-of-LLMs-in-FS-Industry-Roundtable-TruEra-QuantU.pdf
QuantUniversity
 
How to test LLMs in production.pdf
How to test LLMs in production.pdfHow to test LLMs in production.pdf
How to test LLMs in production.pdf
AnastasiaSteele10
 
Large Language Models Bootcamp
Large Language Models BootcampLarge Language Models Bootcamp
Large Language Models Bootcamp
Data Science Dojo
 
How to Enhance NLP’s Accuracy with Large Language Models - A Comprehensive Gu...
How to Enhance NLP’s Accuracy with Large Language Models - A Comprehensive Gu...How to Enhance NLP’s Accuracy with Large Language Models - A Comprehensive Gu...
How to Enhance NLP’s Accuracy with Large Language Models - A Comprehensive Gu...
Nexgits Private Limited
 
How to use LLMs in synthesizing training data?
How to use LLMs in synthesizing training data?How to use LLMs in synthesizing training data?
How to use LLMs in synthesizing training data?
Benjaminlapid1
 
Designing a Generative AI QnA solution with Proprietary Enterprise Business K...
Designing a Generative AI QnA solution with Proprietary Enterprise Business K...Designing a Generative AI QnA solution with Proprietary Enterprise Business K...
Designing a Generative AI QnA solution with Proprietary Enterprise Business K...
IRJET Journal
 
DataScientist Job : Between Myths and Reality.pdf
DataScientist Job : Between Myths and Reality.pdfDataScientist Job : Between Myths and Reality.pdf
DataScientist Job : Between Myths and Reality.pdf
Jedha Bootcamp
 
Improving the Capabilities of Large Language Model based Marketing Analytics ...
Improving the Capabilities of Large Language Model based Marketing Analytics ...Improving the Capabilities of Large Language Model based Marketing Analytics ...
Improving the Capabilities of Large Language Model based Marketing Analytics ...
IJCI JOURNAL
 
AI and ML Series - Introduction to Generative AI and LLMs - Session 1
AI and ML Series - Introduction to Generative AI and LLMs - Session 1AI and ML Series - Introduction to Generative AI and LLMs - Session 1
AI and ML Series - Introduction to Generative AI and LLMs - Session 1
DianaGray10
 
The Era of GPT- Quick Guide
The Era of GPT- Quick GuideThe Era of GPT- Quick Guide
The Era of GPT- Quick Guide
InData Labs
 
Recent trends in career choices.pptx
Recent trends in career choices.pptxRecent trends in career choices.pptx
Recent trends in career choices.pptx
Dr. Harpal Kaur
 

Similar to Train foundation model for domain-specific language model (20)

A comprehensive guide to prompt engineering.pdf
A comprehensive guide to prompt engineering.pdfA comprehensive guide to prompt engineering.pdf
A comprehensive guide to prompt engineering.pdf
 
What Are Foundation Models.pdf
What Are Foundation Models.pdfWhat Are Foundation Models.pdf
What Are Foundation Models.pdf
 
Navigating the Large Language Model choices_Ravi Daparthi
Navigating the Large Language Model choices_Ravi DaparthiNavigating the Large Language Model choices_Ravi Daparthi
Navigating the Large Language Model choices_Ravi Daparthi
 
Northbay_December_2023_LLM_Reporting.pdf
Northbay_December_2023_LLM_Reporting.pdfNorthbay_December_2023_LLM_Reporting.pdf
Northbay_December_2023_LLM_Reporting.pdf
 
Conversational AI Transforming human-machine interaction.pdf
Conversational AI Transforming human-machine interaction.pdfConversational AI Transforming human-machine interaction.pdf
Conversational AI Transforming human-machine interaction.pdf
 
Automate your Job and Business with ChatGPT #3 - Fundamentals of LLM/GPT
Automate your Job and Business with ChatGPT #3 - Fundamentals of LLM/GPTAutomate your Job and Business with ChatGPT #3 - Fundamentals of LLM/GPT
Automate your Job and Business with ChatGPT #3 - Fundamentals of LLM/GPT
 
How to Enhance NLP’s Accuracy with Large Language Models_ A Comprehensive Gui...
How to Enhance NLP’s Accuracy with Large Language Models_ A Comprehensive Gui...How to Enhance NLP’s Accuracy with Large Language Models_ A Comprehensive Gui...
How to Enhance NLP’s Accuracy with Large Language Models_ A Comprehensive Gui...
 
Xml And Ecm
Xml And EcmXml And Ecm
Xml And Ecm
 
Unlocking the Power of Generative AI An Executive's Guide.pdf
Unlocking the Power of Generative AI An Executive's Guide.pdfUnlocking the Power of Generative AI An Executive's Guide.pdf
Unlocking the Power of Generative AI An Executive's Guide.pdf
 
Managing-the-Risks-of-LLMs-in-FS-Industry-Roundtable-TruEra-QuantU.pdf
Managing-the-Risks-of-LLMs-in-FS-Industry-Roundtable-TruEra-QuantU.pdfManaging-the-Risks-of-LLMs-in-FS-Industry-Roundtable-TruEra-QuantU.pdf
Managing-the-Risks-of-LLMs-in-FS-Industry-Roundtable-TruEra-QuantU.pdf
 
How to test LLMs in production.pdf
How to test LLMs in production.pdfHow to test LLMs in production.pdf
How to test LLMs in production.pdf
 
Large Language Models Bootcamp
Large Language Models BootcampLarge Language Models Bootcamp
Large Language Models Bootcamp
 
How to Enhance NLP’s Accuracy with Large Language Models - A Comprehensive Gu...
How to Enhance NLP’s Accuracy with Large Language Models - A Comprehensive Gu...How to Enhance NLP’s Accuracy with Large Language Models - A Comprehensive Gu...
How to Enhance NLP’s Accuracy with Large Language Models - A Comprehensive Gu...
 
How to use LLMs in synthesizing training data?
How to use LLMs in synthesizing training data?How to use LLMs in synthesizing training data?
How to use LLMs in synthesizing training data?
 
Designing a Generative AI QnA solution with Proprietary Enterprise Business K...
Designing a Generative AI QnA solution with Proprietary Enterprise Business K...Designing a Generative AI QnA solution with Proprietary Enterprise Business K...
Designing a Generative AI QnA solution with Proprietary Enterprise Business K...
 
DataScientist Job : Between Myths and Reality.pdf
DataScientist Job : Between Myths and Reality.pdfDataScientist Job : Between Myths and Reality.pdf
DataScientist Job : Between Myths and Reality.pdf
 
Improving the Capabilities of Large Language Model based Marketing Analytics ...
Improving the Capabilities of Large Language Model based Marketing Analytics ...Improving the Capabilities of Large Language Model based Marketing Analytics ...
Improving the Capabilities of Large Language Model based Marketing Analytics ...
 
AI and ML Series - Introduction to Generative AI and LLMs - Session 1
AI and ML Series - Introduction to Generative AI and LLMs - Session 1AI and ML Series - Introduction to Generative AI and LLMs - Session 1
AI and ML Series - Introduction to Generative AI and LLMs - Session 1
 
The Era of GPT- Quick Guide
The Era of GPT- Quick GuideThe Era of GPT- Quick Guide
The Era of GPT- Quick Guide
 
Recent trends in career choices.pptx
Recent trends in career choices.pptxRecent trends in career choices.pptx
Recent trends in career choices.pptx
 

More from Benjaminlapid1

How to build a generative AI solution?
How to build a generative AI solution?How to build a generative AI solution?
How to build a generative AI solution?
Benjaminlapid1
 
Fine-tuning Pre-Trained Models for Generative AI Applications
Fine-tuning Pre-Trained Models for Generative AI ApplicationsFine-tuning Pre-Trained Models for Generative AI Applications
Fine-tuning Pre-Trained Models for Generative AI Applications
Benjaminlapid1
 
How is a Vision Transformer (ViT) model built and implemented?
How is a Vision Transformer (ViT) model built and implemented?How is a Vision Transformer (ViT) model built and implemented?
How is a Vision Transformer (ViT) model built and implemented?
Benjaminlapid1
 
An overview of Google PaLM 2
An overview of Google PaLM 2An overview of Google PaLM 2
An overview of Google PaLM 2
Benjaminlapid1
 
"AI use cases in retail and e‑commerce "
"AI use cases in retail and e‑commerce ""AI use cases in retail and e‑commerce "
"AI use cases in retail and e‑commerce "
Benjaminlapid1
 
How AI is transforming travel and logistics operations for the better
How AI is transforming travel and logistics operations for the betterHow AI is transforming travel and logistics operations for the better
How AI is transforming travel and logistics operations for the better
Benjaminlapid1
 
How to choose the right AI model for your application?
How to choose the right AI model for your application?How to choose the right AI model for your application?
How to choose the right AI model for your application?
Benjaminlapid1
 
Data security in AI systems
Data security in AI systemsData security in AI systems
Data security in AI systems
Benjaminlapid1
 
The current state of generative AI
The current state of generative AIThe current state of generative AI
The current state of generative AI
Benjaminlapid1
 
Supervised learning techniques and applications
Supervised learning techniques and applicationsSupervised learning techniques and applications
Supervised learning techniques and applications
Benjaminlapid1
 
Natural Language Processing: A comprehensive overview
Natural Language Processing: A comprehensive overviewNatural Language Processing: A comprehensive overview
Natural Language Processing: A comprehensive overview
Benjaminlapid1
 
Generative AI: A Comprehensive Tech Stack Breakdown
Generative AI: A Comprehensive Tech Stack BreakdownGenerative AI: A Comprehensive Tech Stack Breakdown
Generative AI: A Comprehensive Tech Stack Breakdown
Benjaminlapid1
 

More from Benjaminlapid1 (12)

How to build a generative AI solution?
How to build a generative AI solution?How to build a generative AI solution?
How to build a generative AI solution?
 
Fine-tuning Pre-Trained Models for Generative AI Applications
Fine-tuning Pre-Trained Models for Generative AI ApplicationsFine-tuning Pre-Trained Models for Generative AI Applications
Fine-tuning Pre-Trained Models for Generative AI Applications
 
How is a Vision Transformer (ViT) model built and implemented?
How is a Vision Transformer (ViT) model built and implemented?How is a Vision Transformer (ViT) model built and implemented?
How is a Vision Transformer (ViT) model built and implemented?
 
An overview of Google PaLM 2
An overview of Google PaLM 2An overview of Google PaLM 2
An overview of Google PaLM 2
 
"AI use cases in retail and e‑commerce "
"AI use cases in retail and e‑commerce ""AI use cases in retail and e‑commerce "
"AI use cases in retail and e‑commerce "
 
How AI is transforming travel and logistics operations for the better
How AI is transforming travel and logistics operations for the betterHow AI is transforming travel and logistics operations for the better
How AI is transforming travel and logistics operations for the better
 
How to choose the right AI model for your application?
How to choose the right AI model for your application?How to choose the right AI model for your application?
How to choose the right AI model for your application?
 
Data security in AI systems
Data security in AI systemsData security in AI systems
Data security in AI systems
 
The current state of generative AI
The current state of generative AIThe current state of generative AI
The current state of generative AI
 
Supervised learning techniques and applications
Supervised learning techniques and applicationsSupervised learning techniques and applications
Supervised learning techniques and applications
 
Natural Language Processing: A comprehensive overview
Natural Language Processing: A comprehensive overviewNatural Language Processing: A comprehensive overview
Natural Language Processing: A comprehensive overview
 
Generative AI: A Comprehensive Tech Stack Breakdown
Generative AI: A Comprehensive Tech Stack BreakdownGenerative AI: A Comprehensive Tech Stack Breakdown
Generative AI: A Comprehensive Tech Stack Breakdown
 

Recently uploaded

FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdfFIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
FIDO Alliance
 
From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...
From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...
From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...
Product School
 
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdfFIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
FIDO Alliance
 
UiPath Test Automation using UiPath Test Suite series, part 3
UiPath Test Automation using UiPath Test Suite series, part 3UiPath Test Automation using UiPath Test Suite series, part 3
UiPath Test Automation using UiPath Test Suite series, part 3
DianaGray10
 
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered Quality
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered QualitySoftware Delivery At the Speed of AI: Inflectra Invests In AI-Powered Quality
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered Quality
Inflectra
 
AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...
AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...
AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...
Product School
 
Mission to Decommission: Importance of Decommissioning Products to Increase E...
Mission to Decommission: Importance of Decommissioning Products to Increase E...Mission to Decommission: Importance of Decommissioning Products to Increase E...
Mission to Decommission: Importance of Decommissioning Products to Increase E...
Product School
 
Bits & Pixels using AI for Good.........
Bits & Pixels using AI for Good.........Bits & Pixels using AI for Good.........
Bits & Pixels using AI for Good.........
Alison B. Lowndes
 
GraphRAG is All You need? LLM & Knowledge Graph
GraphRAG is All You need? LLM & Knowledge GraphGraphRAG is All You need? LLM & Knowledge Graph
GraphRAG is All You need? LLM & Knowledge Graph
Guy Korland
 
Generating a custom Ruby SDK for your web service or Rails API using Smithy
Generating a custom Ruby SDK for your web service or Rails API using SmithyGenerating a custom Ruby SDK for your web service or Rails API using Smithy
Generating a custom Ruby SDK for your web service or Rails API using Smithy
g2nightmarescribd
 
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
UiPathCommunity
 
The Art of the Pitch: WordPress Relationships and Sales
The Art of the Pitch: WordPress Relationships and SalesThe Art of the Pitch: WordPress Relationships and Sales
The Art of the Pitch: WordPress Relationships and Sales
Laura Byrne
 
Leading Change strategies and insights for effective change management pdf 1.pdf
Leading Change strategies and insights for effective change management pdf 1.pdfLeading Change strategies and insights for effective change management pdf 1.pdf
Leading Change strategies and insights for effective change management pdf 1.pdf
OnBoard
 
Assuring Contact Center Experiences for Your Customers With ThousandEyes
Assuring Contact Center Experiences for Your Customers With ThousandEyesAssuring Contact Center Experiences for Your Customers With ThousandEyes
Assuring Contact Center Experiences for Your Customers With ThousandEyes
ThousandEyes
 
Accelerate your Kubernetes clusters with Varnish Caching
Accelerate your Kubernetes clusters with Varnish CachingAccelerate your Kubernetes clusters with Varnish Caching
Accelerate your Kubernetes clusters with Varnish Caching
Thijs Feryn
 
The Future of Platform Engineering
The Future of Platform EngineeringThe Future of Platform Engineering
The Future of Platform Engineering
Jemma Hussein Allen
 
Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...
Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...
Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...
Thierry Lestable
 
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024
Tobias Schneck
 
Essentials of Automations: Optimizing FME Workflows with Parameters
Essentials of Automations: Optimizing FME Workflows with ParametersEssentials of Automations: Optimizing FME Workflows with Parameters
Essentials of Automations: Optimizing FME Workflows with Parameters
Safe Software
 
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdfFIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
FIDO Alliance
 

Recently uploaded (20)

FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdfFIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
 
From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...
From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...
From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...
 
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdfFIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
 
UiPath Test Automation using UiPath Test Suite series, part 3
UiPath Test Automation using UiPath Test Suite series, part 3UiPath Test Automation using UiPath Test Suite series, part 3
UiPath Test Automation using UiPath Test Suite series, part 3
 
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered Quality
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered QualitySoftware Delivery At the Speed of AI: Inflectra Invests In AI-Powered Quality
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered Quality
 
AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...
AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...
AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...
 
Mission to Decommission: Importance of Decommissioning Products to Increase E...
Mission to Decommission: Importance of Decommissioning Products to Increase E...Mission to Decommission: Importance of Decommissioning Products to Increase E...
Mission to Decommission: Importance of Decommissioning Products to Increase E...
 
Bits & Pixels using AI for Good.........
Bits & Pixels using AI for Good.........Bits & Pixels using AI for Good.........
Bits & Pixels using AI for Good.........
 
GraphRAG is All You need? LLM & Knowledge Graph
GraphRAG is All You need? LLM & Knowledge GraphGraphRAG is All You need? LLM & Knowledge Graph
GraphRAG is All You need? LLM & Knowledge Graph
 
Generating a custom Ruby SDK for your web service or Rails API using Smithy
Generating a custom Ruby SDK for your web service or Rails API using SmithyGenerating a custom Ruby SDK for your web service or Rails API using Smithy
Generating a custom Ruby SDK for your web service or Rails API using Smithy
 
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
 
The Art of the Pitch: WordPress Relationships and Sales
The Art of the Pitch: WordPress Relationships and SalesThe Art of the Pitch: WordPress Relationships and Sales
The Art of the Pitch: WordPress Relationships and Sales
 
Leading Change strategies and insights for effective change management pdf 1.pdf
Leading Change strategies and insights for effective change management pdf 1.pdfLeading Change strategies and insights for effective change management pdf 1.pdf
Leading Change strategies and insights for effective change management pdf 1.pdf
 
Assuring Contact Center Experiences for Your Customers With ThousandEyes
Assuring Contact Center Experiences for Your Customers With ThousandEyesAssuring Contact Center Experiences for Your Customers With ThousandEyes
Assuring Contact Center Experiences for Your Customers With ThousandEyes
 
Accelerate your Kubernetes clusters with Varnish Caching
Accelerate your Kubernetes clusters with Varnish CachingAccelerate your Kubernetes clusters with Varnish Caching
Accelerate your Kubernetes clusters with Varnish Caching
 
The Future of Platform Engineering
The Future of Platform EngineeringThe Future of Platform Engineering
The Future of Platform Engineering
 
Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...
Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...
Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...
 
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024
 
Essentials of Automations: Optimizing FME Workflows with Parameters
Essentials of Automations: Optimizing FME Workflows with ParametersEssentials of Automations: Optimizing FME Workflows with Parameters
Essentials of Automations: Optimizing FME Workflows with Parameters
 
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdfFIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
 

Train foundation model for domain-specific language model

  • 1. HOW TO TRAIN AN OPEN­ SOURCE FOUNDATION MODEL INTO A DOMAIN­SPECIFIC LLM? Talk to our Consultant   Listen to the article In the fast-paced world of corporate environments, technological innovations have always been a catalyst, driving business growth with remarkable velocity. One such technological advancement that has lately become a talking point and brought about a paradigm shift is generative AI. Today,  
  • 2. every industry is being profoundly in몭uenced by generative AI, which has heralded the dawn of a new era de몭ned by the widespread use of Large Language Models (LLMs). Corporate entities across diverse sectors are increasingly recognizing the immense potential of customized large language models to enhance their business operations in unprecedented ways. Businesses are progressively pivoting away from generic, universal models due to growing needs for enhanced control, greater data privacy, and cost- e몭cient methodologies. As organizations increasingly leverage the power of customization, they open the doors to a world where intelligent systems stand poised to revolutionize everything from innovation and e몭ciency to the very core of their operational processes. While general-purpose LLMs are trained on large and diverse datasets, domain-speci몭c LLMs are speci몭cally designed for a particular domain and trained on exclusive datasets that cater to an organization’s speci몭c requirements. General-purpose LLMs can be useful for many businesses but domain-speci몭c LLMs are more suitable for situations that require accurate and contextually appropriate outputs within a speci몭c domain. These models, which often outperform extensive models like GPT-3.5 that serve as the basis for ChatGPT, are highly targeted and potentially more e몭cient. This signi몭cant change is further propelled by the availability of numerous open-source foundation models, making it feasible for businesses to craft their personalized LLMs to meet their unique demands and achieve optimal results. Domain-speci몭c LLMs are witnessing widespread adoption as they can adapt to speci몭c requirements, utilize available resources more e몭ciently, and deliver powerful language processing capabilities that address speci몭c challenges and tasks. They are poised to bring about a new era of advanced language processing solutions with enhanced adaptability, resourcefulness, and potency. This article provides detailed insights into domain-speci몭c LLMs, covering
  • 3. aspects like their bene몭ts, challenges involved in their development and how to create an industry-speci몭c LLM, leveraging the potential of an open-source foundation model. All this will help you understand the intricacies of this fascinating progression in AI. What are foundation models? What are LLMs? What are domain-speci몭c LLMs? What are the bene몭ts of training on the domain-speci몭c LLMs? Challenges of building domain-speci몭c custom LLMs A detailed case study of BloombergGPT, a 몭nance-speci몭c LLM What tasks BloombergGPT can perform? The architecture of a domain-speci몭c LLM like BloombergGPT How was BloombergGPT trained? How ethical and privacy issues are handled in BloombergGPT? How to train an open-source foundation model into a domain-speci몭c LLM? What are foundation models? Foundation Models (FMs) Large Language models (LLMs) ex: ChatGPT, Chinchilla, GPT­3 Pretrained Generalized Adaptable Large Self­Supervised LeewayHertz The term “Foundation Models” was put forward by Stanford researchers to describe a new breed of machine learning models. Rather than being
  • 4. designed for a speci몭c task such as image recognition, these models are trained on extensive, diverse datasets using self-supervised learning at scale, allowing them to be 몭ne-tuned for various downstream tasks. Contrary to what the name might suggest, Foundation Models (FMs) are not the bedrock of AI, nor are they suggestive of AGI (Arti몭cial General Intelligence). These models are unique and remarkable in their own right, characterized by 몭ve main traits: 1. Pre-training: FMs are pre-trained with vast data and substantial computing power, making them ready for use without further training. 2. Generalization: Unlike traditional AI models, which are task-speci몭c, FMs are versatile, and designed to tackle numerous tasks. 3. Adaptability: FMs are adjustable through prompting, meaning they respond to user-de몭ned inputs like text. 4. Large-scale: FMs are large in both model size and data size. For instance, GPT-3 boasts 175 billion parameters and was trained on approximately 500,000 million words. 5. Self-supervision: FMs learn from data patterns without explicit labels. Prominent examples of FMs include GPT-3 and DALL-E-2. These models enable everyday users and non-developers to carry out remarkable tasks by providing simple “prompts.” Once trained, FMs can manage a plethora of data types and tasks, making them highly adaptable. They can automate complex work몭ows when paired with an appropriate operational chain. Beyond generating content such as text, images, audio, and video, FMs can also be employed for prediction and classi몭cation tasks, a concept known as discriminative modeling. FMs have initiated a transformative era in knowledge work, fostering improvements in language-related tasks, reasoning, search, and computer vision, especially when combined with other machine learning approaches like Reinforced Learning with Human Feedback. The development of multi-
  • 5. modal FMs, such as CLIP and ViLBERT, is underway, and these models could signi몭cantly enhance robotics. By using approaches such as few-shot learning and domain specialization, FMs can address a multitude of challenges. Despite their remarkable potential, FMs do have their limitations. These include fabricating answers (known as hallucination), issues with temporal shifting or continual adaptation, the requirement of large data sets and considerable computing resources for training, and the need for human evaluation for 몭ne-tuning. Other hurdles include the tension between specialization and diversity in FM training data, uncertainty about optimal adaptation methods, and limited understanding of the model’s inner workings: what a model can do, why it produces certain behaviors, and how it accomplishes this. Leverage LeewayHertz’s expertise in domain- speci몭c LLMs! We train your preferred foundation model on your proprietary data to create a domain-speci몭c LLM that aligns perfectly with your business needs. Learn More What are LLMs? Large Language Models (LLMs) represent a subclass of Foundation Models designed to comprehend and generate text. By ingesting immense amounts of textual data and boasting billions of parameters, these models can perform a variety of tasks based on a user-provided prompt. Functions we encounter daily, such as autocomplete in search engines or Smart Compose in email platforms, are real-world applications of these language models. They excel at predicting the most likely subsequent word or phrase based on
  • 6. the provided text. To understand LLMs, we should recognize that they are built upon advanced neural networks known as transformers. These transformers extract patterns from vast amounts of text data, ranging from strings to numbers to code. It is worth noting that the creation and utilization of LLMs involve substantial hurdles: massive datasets, considerable computational power, and specialized skills are prerequisites for training, 몭ne-tuning, and extracting insights from these models. This explains why only a few models, like BERT and GPT-3, are readily accessible and why many models remain closed-source. LLMs should not be con몭ated with Arti몭cial General Intelligence (AGI) despite their potential. Current LLMs don’t inherently comprehend concepts and abstractions. They have limitations but are still the most e몭ective tools in Natural Language Processing (NLP) for a broad range of tasks. Their potential applications span from coding to assisting with task completion to composing coherent and engaging narratives. The output of an LLM typically undergoes rigorous 몭ltering before a response is generated. This process is integral to ensuring the generated content is factual and safe, which is crucial considering the risks associated with these models. As businesses aim to leverage LLMs for speci몭c products or systems, a high degree of oversight and governance becomes essential. The historical journey of LLMs can be traced back to the early 2010s. Initially, transfer learning with annotated or labeled data was a prevalent practice. However, this approach was costly, challenging to implement consistently, and posed privacy concerns. The advent of transformer-based models around 2017 marked a signi몭cant stride, with models like GPT-3 and ChatGPT making their debut and gaining swift popularity. In the post-ChatGPT world, LLMs continually show promise for various applications. They can explain complex subjects in simple terms, translate languages, generate creative content, brainstorm, assist with online tasks,
  • 7. languages, generate creative content, brainstorm, assist with online tasks, provide customer service, and even perform technical functions. However, they are not without 몭aws. The reliability of answers and the risk of fabricated responses, often referred to as “model hallucination,” highlight the need for further re몭nement of these models. Thus, while LLMs have transformative potential and are reshaping the complex AI landscape, they require careful management and continual re몭nement to ensure they show their full potential without compromising on safety and accuracy. LLM Stack — Simple View Layers of Large Language Model Inference Based on Prompts Fine ­ Tuning (With Labeled Data) Filters (Fact Checking, Safety, Policy etc.) Raw Model Output Training and Compute Human Feedback with Reinforcement Learning + Other Approches LeewayHertz What are domain­specific LLMs? A domain-speci몭c language model constitutes a specialized subset of large language models dedicated to producing highly accurate results within a particular domain. Unlike a generalized LLM, which aims to process a diverse
  • 8. range of topics with satisfactory pro몭ciency, a domain-speci몭c language model focuses on optimizing its performance within a prede몭ned sector, which could range from technical domains such as law or medicine to more casual areas like culinary arts. The methodology of training a domain-speci몭c language model hinges on acquiring a substantial volume of domain-speci몭c data. This data serves as the training set, ensuring the model is immersed in the contextual intricacies and speci몭c knowledge pertinent to the chosen domain. There are two primary strategies to undertake domain-speci몭c pre-training. The 몭rst strategy involves initializing a pre-trained LLM and 몭ne-tuning it with the domain-speci몭c data to adapt its capabilities to the new context. This method typically allows faster convergence and often superior performance. The alternative strategy is to construct a new LLM from scratch using the domain-speci몭c data. The optimal strategy may vary depending on the application and the domain in question. However, this specialization entails a fundamental trade-o몭. A domain- speci몭c language model, such as a model trained exclusively on culinary recipes, can comprehend the subtleties of its specialized domain more pro몭ciently than a generalized model. Nevertheless, its performance deteriorates signi몭cantly when faced with tasks outside its trained domain. On the other hand, a generic LLM possesses broader knowledge, albeit without the 몭ne-grained expertise of a domain-speci몭c language model. This di몭erence is similar to what we see in real-life expertise. It is exceedingly rare to encounter a professional chef who can also provide pro몭cient stock market analysis or a mathematician who doubles as an accomplished artist. Correspondingly, developing an LLM that can seamlessly transition between domains while maintaining a high degree of pro몭ciency is a challenge. The trade-o몭 between breadth and depth of knowledge is intrinsic to the design of language models, necessitating judicious consideration in the development process.
  • 9. Here is a table comparing general LLMs and domain-speci몭c LLMs: Criteria General LLMs (e.g., GPT-3) Domain-speci몭c LLMs (e.g., FoodUDT-1B) Purpose Developed to understand and generate text across a broad range of topics and contexts. Speci몭cally designed to understand and generate text in a particular niche or domain. Training Data Trained on diverse internet text, encompassing a wide range of topics Trained on domain- speci몭c data (in this case, 500K recipes). Knowledge Depth Has a general understanding of a wide range of topics, including domain- speci몭c topics at a high level. Has a deep understanding of a speci몭c domain. Representation of Context Represents text and context based on a broad understanding of language and world knowledge. For example, “apple” is closer to “iPhone” than to “apple pie.” Represents text and context based on deep, specialized knowledge. For example, “apple” is closer to “apple pie” than to “iPhone.”
  • 10. Criteria General LLMs (e.g., GPT-3) Domain-speci몭c LLMs (e.g., FoodUDT-1B) Word Associations Forms associations based on generic context. For example, “meat,” “sugar,” and “dessert” are equally related. Forms associations based on domain- speci몭c context. For example, “dessert” and “sugar” are more related than “meat” and “sugar.” Advantages Versatile and can handle a wide range of tasks and queries. Highly accurate and e몭cient in its speci몭c domain due to targeted training. Limitations May not possess in- depth understanding or generate accurate outputs in specialized domains. Limited to its speci몭c domain and may not generate accurate outputs outside that domain. Use Cases General conversational AI, summarization, translation, and other generic tasks. Highly specialized tasks within the domain (in this case, recipe search and suggestions). Leverage LeewayHertz’s expertise in domain- speci몭c LLMs!
  • 11. We train your preferred foundation model on your proprietary data to create a domain-speci몭c LLM that aligns perfectly with your business needs. Learn More It is important to remember that while domain-speci몭c LLMs can provide better performance within their domain, general LLMs are more versatile and can handle a wider variety of tasks. The choice between the two depends on the speci몭c needs and objectives of the application. What are the benefits using domain­ specific LLMs? In the rapidly evolving 몭eld of arti몭cial intelligence, domain-speci몭c language models have emerged as a critical tool for enhancing task pro몭ciency and model e몭ciency. Tailored to excel in speci몭c 몭elds, these models leverage the extensive knowledge base of large language models and 몭ne-tune it to provide superior generalization capabilities and augmented interpretability in specialized domains. Here are some bene몭ts: Enhanced task pro몭ciency Large language models, ingrained with extensive textual data, demonstrate a comprehensive understanding of language and an abundance of general knowledge, serving as an excellent springboard for domain-speci몭c 몭ne- tuning. While an LLM is suitable for generalized linguistic tasks, it may not perform optimally in specialized domains such as healthcare. Leveraging an LLM as a foundational model and 몭ne-tuning it for speci몭c domains can considerably improve task-related performance. Ampli몭ed e몭ciency
  • 12. Domain-speci몭c models are highly tailored to execute a particular set of tasks and concentrate solely on the information relevant to those tasks. This speci몭city reduces computational overhead and processing requirements, thereby enhancing model e몭ciency and expediting task completion. Superior generalization capabilities Training models to comprehend the particular language and domain-speci몭c knowledge allows them to generalize more pro몭ciently to novel instances within the domain. This becomes a crucial asset when dealing with limited data sets or when the domain incorporates its own unique terminology and concepts. This ability is especially advantageous in rapidly evolving 몭elds such as precision medicine in healthcare. Augmented interpretability The interpretability of domain-speci몭c models is notably enhanced due to their task-speci몭c focus and domain-speci몭c training data. These models yield more relevant outcomes, regardless of the task, be it chat, text search, prediction, or information extraction, thus making them more comprehensible and meaningful. Challenges of building domain­specific custom LLMs Creating custom large language models introduces various challenges for organizations, broadly segmented into data-, technical-, ethical-, and resource-related issues. Data challenges Organizations striving to establish custom LLMs grapple with data-associated challenges, including data acquisition, quality, and privacy. The procurement of substantial, niche-speci몭c data could be demanding, especially when dealing with specialized or con몭dential data. Ensuring data integrity during collection is paramount, and so is addressing data privacy and protection, which requires deploying e몭ective measures to anonymize data and
  • 13. safeguard it throughout training and deployment phases. Technical challenges Technical hurdles in the development of custom LLMs revolve around aspects such as model architecture, training, assessment, and validation. Selection of suitable architecture and parameters necessitates specialized knowledge, while training custom LLMs requires advanced competencies in machine learning. Evaluation becomes complex owing to the lack of established benchmarks for niche-speci몭c tasks, and accuracy, safety, and compliance validation of model responses present additional challenges. Ethical challenges It is essential to confront ethical issues related to bias, fairness, content moderation, and safety while creating custom LLMs. LLMs may inadvertently incorporate and perpetuate biases from training data, making meticulous auditing and mitigation strategies a necessity. Robust content moderation mechanisms must be in place to prevent potentially inappropriate or harmful content generated by custom LLMs. Resource challenges The development of custom LLMs poses resource-related issues, predominantly in terms of computational resources and specialized skills. Training LLMs necessitate substantial computational resources, which could be expensive and inaccessible for some organizations. It also requires a team pro몭cient in machine learning, natural language processing, and software engineering, which could be challenging to recruit and retain, thereby increasing the complexity and cost of the project. Despite the inherent challenges, they are not unconquerable. Organizations can successfully develop and deploy custom LLMs with strategic planning, apt resources, and expert guidance to meet their unique requirements. As the market begins to witness the advent of commercially viable open-source foundation models, the trend of building domain-speci몭c LLMs using these models is set to intensify.
  • 14. models is set to intensify. In the following sections we will describe about the famous BloombergGPT model. By providing an overview of the BloombergGPT model, we help you understand the process, techniques, and strategies to be employed while developing your own domain-speci몭c LLM using a foundation model. Leverage LeewayHertz’s expertise in domain- speci몭c LLMs! We train your preferred foundation model on your proprietary data to create a domain-speci몭c LLM that aligns perfectly with your business needs. Learn More A detailed case study of BloombergGPT, a finance­specific LLM Bloomberg has unveiled BloombergGPT, a state-of-the-art large language model, primed with extensive 몭nancial data for superior performance in 몭nancial sector-speci몭c NLP tasks. This powerful AI tool expedites the evaluation of 몭nancial data, assists in risk assessments, measures 몭nancial sentiment, and could potentially automate accounting and auditing functions. BloombergGPT is a highly specialized language model developed for the 몭nancial domain. It’s a sophisticated construct, equipped with 50 billion parameters. The model’s training utilized a considerable volume of data – 363 billion tokens sourced from Bloomberg’s unique databases and an additional 345 billion tokens derived from generalized datasets. This combination of 몭nancial and general-purpose data has resulted in a model that performs exceptionally well in its specialized domain. BloombergGPT has demonstrated substantial improvements over existing models when
  • 15. tackling 몭nancial tasks. Notably, despite its specialization, the model sustains commendable performance levels on standard large language model benchmarks, asserting its versatile applicability. According to Bloomberg, the 몭nancial industry’s intricate nature necessitates an AI speci몭cally trained on 몭nancial data. In this context, BloombergGPT has been designed to interact with the Bloomberg Terminal, a software platform that provides real-time market data, breaking news, in-depth 몭nancial research, and sophisticated analytics to 몭nancial professionals. BloombergGPT is a pioneering stride towards leveraging this novel technology for the 몭nancial sector. It is projected to enhance Bloomberg’s existing NLP functionalities, including sentiment analysis, named entity recognition, news classi몭cation, and question-answering. Moreover, the model’s capability to structure the vast data accessible on the Bloomberg Terminal can signi몭cantly improve client services and unlock AI’s immense potential within the 몭nancial industry. Bloomberg’s research emphasizes that while general models eliminate the need for specialized training due to their wide-ranging capabilities, existing domain-speci몭c models’ results suggest that they are irreplaceable. Despite the 몭nancial domain being Bloomberg’s primary focus, the company supports diverse tasks that bene몭t from a general model. Hence, Bloomberg aimed to construct a model like BloombergGPT, which performs commendably on general-purpose LLM benchmarks, while also excelling in 몭nancial tasks. BloombergGPT stands as a potent Large Language Model (LLM), speci몭cally optimized for 몭nancial Natural Language Processing (NLP) tasks. This optimization is achieved by integrating both domain-speci몭c and general- purpose data during its training phase. Bloomberg Query Language (BQL) functions as a sophisticated query language, enabling access and analysis of 몭nancial data within Bloomberg’s
  • 16. platform. BQL, although a powerful tool, presents complexities and can be applied to diverse tasks including data search, data analysis, report generation, and insight derivation. BloombergGPT streamlines this process by converting natural language queries into valid BQL, thereby facilitating more intuitive interactions with 몭nancial data. Additionally, BloombergGPT holds potential applications for news settings, serving as a tool for suggesting news headlines. This functionality proves valuable for news applications and assists journalists in curating newsletters. The model accepts paragraphs as inputs and proposes an appropriate title in return. This capability enhances the ease and speed of news article creation, contributing further to BloombergGPT’s utility in the areas of 몭nance and news. BloombergGPT has demonstrated a competitive performance compared to GPT-3 and other LLMs on general tasks, outpacing them in several 몭nance- speci몭c tasks. This introduction of BloombergGPT represents a signi몭cant breakthrough in AI-driven 몭nancial analysis, amplifying the application scope of NLP within the 몭nancial technology landscape. What tasks BloombergGPT can perform? Financial tasks in natural language processing When BloombergGPT approaches 몭nancial tasks within the NLP discipline, it handles them similarly to general NLP tasks. However, these 몭nancial tasks carry unique features and challenges that di몭erentiate them from their general counterparts. A prime example of this dynamic can be seen in sentiment analysis. Sentiment analysis is a common NLP task, which aims to identify the tone or emotional context of a given text. Take, for instance, a headline such as “COMPANY to cut 10,000 jobs”. In general, this would typically suggest a negative sentiment as it implies job loss and potential hardship for many individuals. However, when seen through the lens of 몭nancial context, the
  • 17. sentiment can be perceived di몭erently. In the world of 몭nance, job cuts often indicate a move towards operational e몭ciency by a company. Such a move can increase the company’s stock price or bolster investor con몭dence in the company. This clearly demonstrates the multifaceted nature of sentiment analysis when applied within the 몭nancial domain, as understood and implemented by BloombergGPT. External 몭nancial benchmarks When it comes to external 몭nancial benchmarks, BloombergGPT utilizes a variety of resources. This includes four tasks drawn from the FLUE benchmark as well as the ConvFinQA dataset. The performance of large language models on these 몭nancial tasks is a relatively unexplored territory, making the assessment complex. BloombergGPT operates in a unique fashion, utilizing a few-shot learning approach when dealing with these tasks. This technique involves training the model on a small number of examples or ‘shots’. The purpose behind this method is to optimize the average performance across all models. This approach demonstrates how BloombergGPT addresses the challenges of evaluating performance on tasks where there isn’t a broad consensus or standard testing framework. Despite the intricate nature of these tasks, BloombergGPT remains committed to performing e몭ciently, abiding by its main principle of choosing a number of shots that would maximize average performance. Internal task: Sentiment analysis When dealing with internal tasks, BloombergGPT focuses heavily on aspect- speci몭c sentiment analysis, a practice common in 몭nancial literature. The process involves a methodical approach, structured around a distinct discovery phase.
  • 18. The discovery phase is instrumental in setting up the groundwork for sentiment analysis. It establishes the annotation and sampling procedures, which are essential components of the process. Additionally, this phase helps to gauge the required number of annotators for each example. The training requirements for the annotators are also determined during this discovery phase. BloombergGPT relies on this structured procedure to maintain a consistent and e몭ective approach to sentiment analysis, ensuring accurate and relevant results. This method underscores BloombergGPT’s commitment to perform complex tasks with diligence and precision. Exploratory task: Named Entity Recognition (NER) Named Entity Recognition (NER) is a well-established task in the broader scope of natural language processing. However, this area has not been deeply investigated when it comes to generative language learning models. Recognizing the relevance of NER in the 몭nancial sector, BloombergGPT embarked on an exploratory journey into this task, aiming to uncover new insights and approaches. The team behind BloombergGPT reported preliminary results from their exploration into NER. As part of this endeavor, they utilized seven distinct NER datasets originating from various domains within Bloomberg’s internal resources. By venturing into this relatively uncharted territory within generative LLMs, BloombergGPT is enhancing its capabilities and contributing valuable research and 몭ndings to the broader 몭eld of natural language processing. Evaluation on standard general-purpose NLP tasks The performance of BloombergGPT was critically examined on a variety of standard general-purpose NLP tasks. This evaluation process utilized BIG- bench Hard, representing a subset of the most demanding tasks within the broader BIG-bench benchmark.
  • 19. Although the main focus of BloombergGPT is distinctly 몭nance-oriented tasks, the inclusion of general-purpose training data in its learning process was considered essential. This approach is based on the possibility that such comprehensive data might augment BloombergGPT’s e몭cacy. This could not only enhance its capabilities in dealing with 몭nancial tasks, but could potentially improve its performance on conventional NLP tasks as well. Therefore, even as BloombergGPT is fundamentally 몭nance-focused, it appreciates the value of a well-rounded NLP competence, actively seeking to excel in not just specialized, but also universal NLP tasks. Knowledge assessments BloombergGPT’s knowledge capacity, speci몭cally its capability to recall information learned during its training phase, was put under the microscope by the research team. This evaluation was conducted through various scenarios where the model was required to provide answers to posed questions, without the aid of additional context or resources. These scenarios comprised multiple-choice questions, along with benchmarks based on reading comprehension. Thus, the assessment was designed to evaluate not just the factual knowledge retained by BloombergGPT during its training, but also its ability to interpret, understand and utilize that knowledge e몭ectively. It provided a valuable insight into BloombergGPT’s understanding and recall capabilities, which are critical aspects for any language model, especially in the 몭nancial domain where accuracy and reliability of information is paramount. Linguistic tasks In the evaluation of BloombergGPT, certain tasks were not directly linked to end-user applications. These were essentially linguistic tasks, put in place to measure the model’s pro몭ciency in understanding language at its core.
  • 20. Such tasks scrutinized BloombergGPT’s capability in tackling disambiguation, parsing grammar, and understanding entailment. These areas are fundamental to language understanding and were chosen to directly assess BloombergGPT’s linguistic competence. By examining these abilities, the team aimed to ensure that the model could adeptly navigate language complexities, a skill that would prove invaluable even in its primary focus area of 몭nancial language processing. The architecture of a domain-speci몭c LLM like BloombergGPT The creation of BloombergGPT was a joint endeavor between the Machine Learning (ML) Product and Research group and the AI Engineering team at Bloomberg. Their objective was to compile one of the largest domain-speci몭c datasets to date. This involved leveraging Bloomberg’s vast data creation, collection, and curation resources. The team utilized Bloomberg’s expansive archive of 몭nancial data, assembling a comprehensive dataset of 363 billion tokens composed of English 몭nancial documents. This corpus was further augmented with a public dataset of 345 billion tokens, resulting in a combined training corpus surpassing 700 billion tokens. To operationalize this vast dataset, the team elected to train a 50-billion parameter decoder-only causal language model. This type of model is adept at generating text in a sequential manner, making it ideal for processing natural language data. The resulting model was then tested on several validation sets: 몭nance-speci몭c natural language processing benchmarks, Bloomberg’s internal benchmarks, and a variety of generic NLP tasks drawn from widely recognized benchmarks. BloombergGPT demonstrated exceptional performance, surpassing existing open source models of comparable size on 몭nance-speci몭c tasks by substantial margins. Importantly, while its specialization didn’t impede its ability to tackle broader tasks, the model also matched or exceeded performance on the general NLP benchmarks. This underscores the potential
  • 21. of BloombergGPT to enhance 몭nancial NLP tasks without sacri몭cing its utility for more general applications. BloombergGPT’s architecture is based on a language model called BLOOM, which operates as a causal or “decoder-based” model. “Causal” essentially means that the model generates text sequentially, predicting the next word in a sequence based on the words that came before it. The core of the BLOOM model consists of 70 layers of transformer decoder blocks. Here’s a simpli몭ed explanation of what this means: Transformer decoder blocks: These are a type of model structure used in natural language processing. They are designed to handle sequential data, like sentences, where the order of the elements is important. The transformer decoder block uses the following components: SA (Self attention): This mechanism helps the model pay varying amounts of ‘attention’ to di몭erent words in the input when generating the output. It allows the model to focus on the important words and understand the context better. LN (Layer Normalization): This is a technique to stabilize and speed up the learning process of the model. It ensures that the calculations the model performs stay within a manageable range, preventing numerical issues. FFN (Feed-forward Network): This is a type of arti몭cial neural network where information moves in one direction – from input to output. It does not have any cycles or loops, and each layer of neurons is fully connected to the next layer. The model also ties the input token embeddings (the numerical representation of words) to the linear mapping that occurs before the 몭nal output is decided. This step helps reduce the model’s complexity and can improve the e몭ciency of the model’s learning process. Finally, there is an extra layer normalization step added after token
  • 22. embeddings. In this case, the initial token embedding, and the additional component of layer normalization for the embeddings. The inclusion of two consecutive layer normalization steps ensures the model’s stability and assists in e몭cient learning. In summary, the architecture of the BloombergGPT model is designed to e몭ciently learn from and generate text data, with several features intended to enhance the model’s stability and learning speed. Leverage LeewayHertz’s expertise in domain- speci몭c LLMs! We train your preferred foundation model on your proprietary data to create a domain-speci몭c LLM that aligns perfectly with your business needs. Learn More How was BloombergGPT trained? Step 1: Constructing the FinPile dataset The research team responsible for the creation of BloombergGPT made use of an extensive and diverse dataset titled “FinPile” for its training process. FinPile comprises various English 몭nancial documents from multiple sources, namely news articles, 몭lings, press releases, and social media content, all from the extensive Bloomberg archives. This wealth of 몭nancial data includes elements like company 몭lings, 몭nancial news, and other information relevant to market trends. Some of this data is publicly accessible, while other parts can only be accessed through purchase or exclusive access via the Bloomberg Terminal. This data is meticulously cleaned to remove any unnecessary elements such as markup, special formatting, and templates. Moreover, each document is time-stamped,
  • 23. spanning from March 1, 2007, to July 31, 2022, indicating a gradual improvement in data quality and volume over time. While the team can’t release the FinPile dataset, they aim to share their insights and experiences in building this highly domain-speci몭c model. They also intend to leverage date information in future research work. FinPile’s 몭nancial data is paired with public data commonly used for LLM training to create a balanced training corpus, having an equal representation of domain-speci몭c and general-purpose text. The team undertook a de- duplication process for each dataset in order to maintain high data quality, which could result in di몭ering statistical reports compared to other studies. Step 2: Tokenization The research team behind BloombergGPT decided to employ the Unigram tokenizer, in preference to other greedy merge-based sub-word tokenizers like BPE or Wordpiece. This decision was based on the Unigram tokenizer’s promising performance in prior research studies. They adopted a byte-based approach to data handling instead of the more common practice of treating data as a sequence of Unicode characters. Additionally, the team employed a pre-tokenization process similar to that used in GPT-2, which breaks the input into distinct chunks. This method facilitates the learning of multi-word tokens, thereby increasing information density and reducing context lengths. The team used the Pile dataset to train their tokenizer. Given its diverse content, The Pile was considered a suitable choice for their purposes. Handling such a large dataset required a parallel tokenizer training method, and the team adopted a split and merge strategy for this. They trained a Unigram tokenizer on each chunk of data and then merged them in a hierarchical manner to create the 몭nal tokenizer. The resulting tokenizer consisted of 7 million tokens, which was further condensed to 131,072 tokens. The decision of vocabulary size was in몭uenced
  • 24. by several factors, including the need to 몭t more information into the context window and considerations related to overhead. The team arrived at a vocabulary size of 131,072 tokens following experiments with di몭erent vocabulary sizes and the smallest encoded representation of the C4 dataset. Consequently, the tokenizer used for BloombergGPT is relatively larger than the standard vocabulary size of approximately 50,000 tokens. Step 3: Building the model The Bloomberg team builds the BloombergGPT model upon the structural foundation provided by the BLOOM model. This involves utilizing a decoder- only causal language model comprising 70 layers of transformer decoder blocks. These blocks consist of multi-head self-attention, layer normalization, and a feed-forward network with a single hidden layer. A GELU function is employed as the non-linear element within the network. Additionally, ALiBi positional encoding, or Attention with Linear Biases, is implemented and the input token embeddings are connected to the linear mapping preceding the 몭nal softmax application. An additional layer normalization is executed following the token embeddings. Leveraging the scaling laws from Chinchilla, the Bloomberg team determines the size of the BloombergGPT model. They set their total compute budget at 1.3 million GPU hours on 40GB A100 GPUs. The costs associated with activation checkpointing are accounted for and the Chinchilla equations are used to ascertain the optimum balance between the number of parameters and tokens. Despite possessing a signi몭cant dataset of approximately 700 billion tokens, the Bloomberg team found it too limited for the optimal con몭guration given their compute budget. To circumnavigate this constraint, they opt for the largest possible model, having 50 billion parameters, while preserving around 30% of the total compute budget as a safeguard for potential challenges that may emerge.
  • 25. To set the structure of the BloombergGPT model, which has 50 billion parameters, the Bloomberg team used recommendations from Levine et al. (2020). According to these recommendations, the ideal size (D) for the hidden layers depends on the model’s number of self-attention layers (L). The team chose 70 self-attention layers (L = 70) and a target size of 7510 for the hidden layers (D = 7510). To enhance the performance of the model’s Tensor Core operations, the team used a design with 40 heads, each having a dimension of 192. This resulted in a total hidden size (D) of 7680, bringing the overall parameter count to 50.6 billion. Step 4: Training the model The BloombergGPT model, built on PyTorch, is trained following a standard causal language modeling objective, with a directionality from left to right. The team has strategically set the length of the training sequences at 2,048 tokens to make optimal use of GPU resources. The AdamW optimizer is utilized in the optimization process, and the learning rate aligns with a cosine decay schedule that includes a linear warmup. Additionally, a batch size warmup is applied for enhanced performance. The training and evaluation stages of the model are conducted on Amazon SageMaker, leveraging 64 instances of p4d.24xlarge, each equipped with 8 NVIDIA 40GB A100 GPUs. To ensure e몭cient data access and high-speed throughput, the team employs Amazon FSX for Lustre, which supports up to 1000 MB/s for both read and write operations per TiB of the storage unit. Step 5: Large-scale optimization To train the substantial BloombergGPT model within the constraints of GPU memory, the Bloomberg team implements a series of optimization techniques: ZeRO optimization (stage 3): This approach involves dividing the training
  • 26. state across a group of GPUs. In the case of BloombergGPT, the team deploys 128 GPUs for the purpose of sharding and maintains four replicas of the model throughout the training process. MiCS: This system is designed to lower the communication overhead and memory requirements for training clusters in the cloud. It utilizes hierarchical communication, a 2-hop gradient update mechanism, and a scale-aware model partitioning scheme. Activation checkpointing: This technique discards activations and recalculates intermediate tensors during backward passes, as and when necessary to minimize the memory consumed during the training process. Mixed precision training: This strategy helps save memory by executing forward and backward passes in BFloat16 precision, while storing and updating the parameters in full precision (FP32). Fused kernels: This method amalgamates multiple operations into a singular GPU operation. The result is a reduction in peak memory usage due to the avoidance of storage of intermediate results and an enhancement in speed. By applying these optimizations, the Bloomberg team successfully manages to yield an average computational performance of 102 TFLOPs, with each training step taking around 32.5 seconds. How ethical and privacy issues are handled in BloombergGPT? BloombergGPT, designed for 몭nancial applications, is developed with a thorough focus on ethical and privacy considerations. These aspects are meticulously addressed in the development process through a series of well- structured measures. Risk assessment and compliance procedures Recognizing the sensitivity of the 몭nancial sector, Bloomberg has established a robust risk assessment and testing process. Accuracy and factual
  • 27. information are deemed crucial for the 몭rm’s reputation and the satisfaction of its clients, who are increasingly seeking to incorporate advanced technology into their operations. This process involves stringent annotation guidelines, multi-level pre-launch reviews involving central risk and compliance units, and continuous post- launch monitoring. All research, development, and deployment procedures for BloombergGPT align with relevant regulations, maintaining high standards for compliance. Toxicity and bias control Bloomberg exerts extraordinary e몭ort to control any potential toxicity and bias in its content, produced either by humans or AI models. The team behind BloombergGPT maintains an ongoing interest in evaluating and quantifying potential generation of harmful language in various application contexts. Particular attention is given to studying the in몭uence of the FinPile corpus, which is characterized by a lower incidence of biased or toxic language, on the model’s tendencies. As BloombergGPT is integrated into new products, these rigorous testing procedures and compliance controls will be applied to ensure safe usage. Model release and openness The question of how to release LLMs is a subject of ongoing debate in the AI community. Given the potential for misuse of LLMs, especially ones like BloombergGPT that are trained on a wealth of sensitive data, Bloomberg employs a strategic approach to this issue. Various strategies are considered, ranging from releasing trained models under speci몭c usage licenses, to granting API access without revealing underlying model parameters or detailed training data. In light of the signi몭cant risk of data leakage attacks, Bloomberg has opted for a conservative approach, withholding the release of BloombergGPT
  • 28. model parameters. This decision is made in alignment with the company’s mission to provide access to data collected over decades, while also ensuring its protection from potential misuse. Contribution to the 몭eld Despite not releasing the model, BloombergGPT’s development journey provides valuable insights to the broader NLP community, especially those creating domain-speci몭c models. The experience gained in training and evaluating BloombergGPT can help better understand these models, contributing to the shared knowledge in the 몭eld of NLP and LLMs. Bloomberg acknowledges the in몭uence of various other models and projects in shaping the development of BloombergGPT, and continues its e몭orts in contributing to this collective intelligence. How to train an open­source foundation model into a domain­specific LLM? The objective here is to develop a classi몭cation model leveraging the robust capabilities of BERT, a pre-trained large language model. To customize this model to 몭t a speci몭c domain, it will be 몭ne-tuned using domain-speci몭c data, speci몭cally the Toxic Comment Classi몭cation Challenge data, sourced from the realm of social media. The dataset essential for executing this project is available on Kaggle. Only the ‘train.csv’ 몭le among the available datasets will be utilized for this particular case. The complete code is available on GitHub. Let’s code! Step 1: Loading the data set import numpy as np import pandas as pd import torch import torch.nn as nn
  • 29. from sklearn.model_selection import train_test_split from sklearn.metrics import classification_report import transformers from transformers import AutoModel, BertTokenizerFast # specify GPU if you have a GPU device = torch.device("cuda") # ­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­ # Lead dataset df = pd.read_csv("train.csv") # mydata.csv has only 2 columns "text" and "label" (binary) df = df.sample(6400)[['comment_text', 'toxic']] df.columns = ['text', 'label'] # print data and take a look at it df.head() # check for class ditribution print(df['label'].value_counts(normalize=True)) # Split data into train and test train_text, temp_text, train_labels, temp_labels = train_test_split( df['text'], df['label'], random_state=1234, test_size=0.3, stratify=df['label']) # Using temp_text and temp_labels created in previous step generate va val_text, test_text, val_labels, test_labels = train_test_split(
  • 30. val_text, test_text, val_labels, test_labels = train_test_split( temp_text, temp_labels, random_state=1234, test_size=0.5, stratify=temp_labels) # This will make 70% training data, 15% validation, and 15% test Step 2: Tokenizaton # Import BERT tokenizer from the pretained model bert = AutoModel.from_pretrained('bert­base­uncased') tokenizer = BertTokenizerFast.from_pretrained('bert­base­uncased') # Get length of all the sentences in the data seq_len = [len(i.split()) for i in train_text] pd.Series(seq_len).hist(bins=100) # Select the max_seq_len that we will be keeping based on the distribu max_seq_len = 100 # Tokenize sequences trimmed at max length determined above in the tra # tokenize and encode sequences in the tokens_train = tokenizer.batch_encode_plus(train_text.tolist(), max_length=max_seq_len, pad_to_max_length=True, truncation=True, return_token_type_ids=False) tokens_val = tokenizer.batch_encode_plus(val_text.tolist(), max_length=max_seq_len, pad_to_max_length=True, truncation=True,
  • 31. return_token_type_ids=False) tokens_test = tokenizer.batch_encode_plus(test_text.tolist(), max_length=max_seq_len, pad_to_max_length=True, truncation=True, return_token_type_ids=False) # Convert tokens to tensors for train, validation, and test sts train_seq = torch.tensor(tokens_train['input_ids']) train_mask = torch.tensor(tokens_train['attention_mask']) train_y = torch.tensor(train_labels.tolist()) val_seq = torch.tensor(tokens_val['input_ids']) val_mask = torch.tensor(tokens_val['attention_mask']) val_y = torch.tensor(val_labels.tolist()) test_seq = torch.tensor(tokens_test['input_ids']) test_mask = torch.tensor(tokens_test['attention_mask']) test_y = torch.tensor(test_labels.tolist()) Step 3: Creating a data loader to prepare the data for training from torch.utils.data import TensorDataset, DataLoader, RandomSampler, #define a batch size (You may want to tweak batch size based on your s batch_size = 32 # wrap tensors, random smapler, and prep dataloader for train data train_data = TensorDataset(train_seq, train_mask, train_y) train_sampler = RandomSampler(train_data) train_dataloader = DataLoader(train_data,
  • 32. sampler=train_sampler, batch_size=batch_size) # wrap tensors, random smapler, and prep dataloader for vaidation data val_data = TensorDataset(val_seq, val_mask, val_y) val_sampler = SequentialSampler(val_data) val_dataloader = DataLoader(val_data, sampler=val_sampler, batch_size=batch_size) # freeze all the parameters by setting requires_grad = False for param in bert.parameters(): param.requires_grad = False Step 4: De몭ning the model architecture: Adding 3 layers and 1 binary output layer to the NN class BERT_Arch(nn.Module): def __init__(self, bert): super(BERT_Arch, self).__init__() self.bert = bert # dropout layer self.dropout = nn.Dropout(0.1) # relu activation function self.relu = nn.ReLU() # dense layer 1 # Note 768 is immutable for BERT, you can choose a dim self.fc1 = nn.Linear(768, 512)
  • 33. self.fc1 = nn.Linear(768, 512) # dense layer 2 (Output layer) self.fc2 = nn.Linear(512, 128) # dense layer 3 (Output layer) self.fc3 = nn.Linear(128, 32) # dense layer 4 (Output layer) self.fc4 = nn.Linear(32, 2) #softmax activation function self.softmax = nn.LogSoftmax(dim=1) #define the forward pass def forward(self, sent_id, mask): #pass the inputs to the model _, cls_hs = self.bert(sent_id, attention_mask=mask, return_dict=False) # Execute 1st layer x = self.fc1(cls_hs) x = self.relu(x) x = self.dropout(x) # Execute 2nd layer x = self.fc2(x) x = self.relu(x) x = self.dropout(x) # Execute 3rd layer x = self.fc3(x)
  • 34. x = self.relu(x) x = self.dropout(x) # output layer x = self.fc4(x) # apply softmax activation x = self.softmax(x) return x Step 5: Passing the pre-trained BERT to prepare the loss function model = BERT_Architecture(bert) # push the model to GPU model = model.to(device) # optimizer from hugging face transformers from transformers import AdamW # define the optimizer optimizer = AdamW(model.parameters(), lr=1e­3) # Find class weights and push those to GPU from sklearn.utils.class_weight import compute_class_weight class_weights = compute_class_weight(class_weight="balanced", classes=np.unique(train_labels), y=train_labels) # class_weights = dict(zip(np.unique(train_labels), class_weights)) # convert class weights to tensor
  • 35. weights = torch.tensor(class_weights, dtype=torch.float) weights = weights.to(device) cross_entropy = nn.NLLLoss(weight=weights) # number of training epochs epochs = 100 print(class_weights) Step 6: Train the model def train(): # Train model model.train() # Initiate the loss and accuracy as zero to start with (it will update total_loss, total_accuracy = 0, 0 # empty list to save model predictions total_preds = [] # iterate over batches for step, batch in enumerate(train_dataloader): # push the batch to gpu batch = [r.to(device) for r in batch] sent_id, mask, labels = batch # clear previously calculated gradients model.zero_grad() # get model predictions for the current batch preds = model(sent_id, mask)
  • 36. # compute the loss between actual and predicted values loss = cross_entropy(preds, labels) # add on to the total loss total_loss = total_loss + loss.item() # backward pass to calculate the gradients loss.backward() # clip the the gradients to 1.0. It helps in preventing the exploding torch.nn.utils.clip_grad_norm_(model.parameters(), 1.0) # update parameters optimizer.step() # model predictions are stored on GPU. So, push it to CPU preds = preds.detach().cpu().numpy() # append the model predictions total_preds.append(preds) # compute the training loss of the epoch avg_loss = total_loss / len(train_dataloader) # predictions are in the form of (no. of batches, size of batch, no. o # We will have to reshape the predictions in form of (number of sample total_preds = np.concatenate(total_preds, axis=0) #returns the loss and predictions return avg_loss, total_preds
  • 37. Step 7: Evaluate the model def evaluate(): print("nEvaluating...") # deactivate dropout layers model.eval() # Again starting loss and accuracy as zero total_loss, total_accuracy = 0, 0 # empty list to save the model predictions total_preds = [] # iterate over batches for step, batch in enumerate(val_dataloader): # push the batch to gpu batch = [t.to(device) for t in batch] sent_id, mask, labels = batch # deactivate autograd with torch.no_grad(): # model predictions preds = model(sent_id, mask) # compute the validation loss between actual and predicted values loss = cross_entropy(preds, labels) total_loss = total_loss + loss.item() preds = preds.detach().cpu().numpy() total_preds.append(preds)
  • 38. # compute the validation loss of the epoch avg_loss = total_loss / len(val_dataloader) # reshape the predictions in form of (number of samples, no. of classe total_preds = np.concatenate(total_preds, axis=0) return avg_loss, total_preds Step 8: Run the pipeline to iteratively train and evaluate model and get the output for every epoch # set initial loss to infinite best_valid_loss = float('inf') # empty lists to store training and validation loss of each epoch train_losses = [] valid_losses = [] #for each epoch for epoch in range(epochs): #train model train_loss, _ = train() #evaluate model valid_loss, _ = evaluate() #save the best model if valid_loss < best_valid_loss: best_valid_loss = valid_loss torch.save(model.state_dict(), 'saved_weights.pt')
  • 39. # append training and validation loss train_losses.append(train_loss) valid_losses.append(valid_loss) print( f'Epoch {epoch + 1}/{epochs}tTraining Loss: {train_loss:.3f}tValidat ) #load weights of best model path = 'saved_weights.pt' model.load_state_dict(torch.load(path)) # get predictions for test data with torch.no_grad(): preds = model(test_seq.to(device), test_mask.to(device)) preds = preds.detach().cpu().numpy() # model's performance preds = np.argmax(preds, axis=1) print(classification_report(test_y, preds)) pd.crosstab(test_y, preds) Endnote As we conclude, it’s clear that domain-speci몭c large language models are not just bene몭cial, but indeed necessary for modern organizations. The shift from generic models to more tailored language processing models addresses unique industry needs, enhances performance, enables personalized interactions, and streamlines operations. Moreover, these custom models provide improved data privacy control and 몭nancial e몭ciency. In the rapidly evolving world of AI and natural language processing, adopting
  • 40. domain-speci몭c LLMs represents a crucial step towards staying competitive and innovative. These models, 몭ne-tuned to understand speci몭c industry language and context, transform how organizations interact with data and digital interfaces. The need for domain-speci몭c LLMs is undeniable, and it is now a question of how quickly organizations can adapt and integrate this powerful tool into their processes to unlock its full potential. Leverage the power of customized LLMs! Contact LeewayHertz to create your own domain-speci몭c LLM for highly accurate results and unparalleled performance within your domain. Author’s Bio Akash Takyar CEO LeewayHertz Akash Takyar is the founder and CEO at LeewayHertz. The experience of building over 100+ platforms for startups and enterprises allows Akash to rapidly architect and design solutions that are scalable and beautiful. Akash's ability to build enterprise-grade technology solutions has attracted over 30 Fortune 500 companies, including Siemens, 3M, P&G and Hershey’s.
  • 41. Akash is an early adopter of new technology, a passionate technology enthusiast, and an investor in AI and IoT startups. Write to Akash Start a conversation by filling the form Once you let us know your requirement, our technical expert will schedule a call and discuss your idea in detail post sign of an NDA. All information will be kept con몭dential. Name Phone Company Email Tell us about your project Send me the signed Non-Disclosure Agreement (NDA ) Start a conversation Insights
  • 42. How to use LLMs in synthesizing training data? Harnessing the power of large language models (LLMs), a mighty tool capable of understanding, generating, and even re몭ning human-like text we can generate synthesized training data that is 몭awless and train our models more e몭ciently. Read More Prompt LLM Completion ‘ Jack has a cat. What animal is Jack’s pet?’ ‘cat’
  • 43. Build an LLM­powered application using LangChain: A comprehensive step­by­step guide LangChain is a framework that provides a set of tools, components, and interfaces for developing LLM-powered applications. How to build a private LLM? Language models are the backbone of natural language processing (NLP) and have changed how we interact with language and technology. Read More User Prompt Model Output LLM Training Data Read More Show all Insights
  • 44. LEEWAYHERTZPORTFOLIO SERVICES GENERATIVE AI INDUSTRIES PRODUCTS CONTACT US Get In Touch 415-301-2880 info@leewayhertz.com 388 Market Street Suite 1300 San Francisco, California 94111 About Us Global AI Club Careers Case Studies Work Community TraceRx ESPN Filecoin Lottery of People World Poker Tour Chrysallis.AI Generative AI Arti몭cial Intelligence & ML Web3 Blockchain Software Development Hire Developers Generative AI Development Generative AI Consulting Generative AI Integration LLM Development Prompt Engineering ChatGPT Developers Consumer Electronics Financial Markets Healthcare Logistics Manufacturing Startup Whitelabel Crypto Wallet Whitelabel Blockchain Explorer Whitelabel Crypto Exchange Whitelabel Enterprise Crypto Wallet Whitelabel DAO  
  • 45. Privacy & Cookies Policy info@leewayhertz.com jobs@leewayhertz.com Sitemap ©2023 LeewayHertz. All Rights Reserved.