Pascal Pfeiffer, Principal Data Scientist, H2O.ai
H2O Open Source GenAI World SF 2023
This talk dives into the expansive ecosystem of Large Language Models (LLMs), offering practitioners an insightful guide to various relevant applications, from natural language understanding to creative content generation. While exploring use cases across different industries, it also honestly addresses the current limitations of LLMs and anticipates future advancements.
5. H2O.ai Confidential
Retrieval Augmented Generation
● Answers are generated with additional
context from external knowledge
database.
● Grounding LLM answers to the
retrieved knowledge/facts.
● LLMs do not need to be retrained when
new knowledge is added or modified.
● H2OGPT / LangChain
Some Good Use-Cases
If you haven’t already started exploring
Summarization
● Business related summaries can be a
disruptive use-case. Every meeting of a
company can be recorded, transcribed,
and then summarized using LLM.
● Needs long-context models or iterative
reduction techniques.
● Intrinsically grounded LLM output.
Residual risk ok misinterpretations.
Creative Writing
● Promising area of interest for short
texts.
● Limited in creativity and novelty. Risk of
hallucination as it is challenging to
ground the generated text.
● LLM as a coworker/copilot.
6. v
H2O.ai Confidential
check out genai.h2o.ai for more inspiration
Function calling
Classification
Regression
Coding
Writing Helper
From simple spell and
grammar checking up to
professional rephrasing.
Broad applications when
LLMs are trained to call
APIs. Usage of external
tools (simple calculator
or browsing internet).
Translate natural
language to SQL queries
Use LLMs for classical
NLP use-cases such as
classification and
regression tasks. Can also
be retrieval augmented.
Writing code will be a
valuable task for LLMs
especially for small fine
grained tasks. This will
enable a lot more
software engineers to be
10x or the already 10x to
be 100x.
Autonomous
Agents
Fully automate repetitive
workflows. Make use of
function calling and
natural language
interaction. Requires LLM
memory management.
… Much More
There is more
Complexity
Function calling
Broad applications when
LLMs are trained to call
APIs. Usage of external
tools (simple calculator
or browsing internet).
Translate natural
language to SQL queries
Classification
Regression
Use LLMs for classical
NLP use-cases such as
classification and
regression tasks. Can also
be retrieval augmented.
7. H2O.ai Confidential
“How do I …?”
“Summarize!”
“What are …?”
Technical Ways to use
LLMs
Best practice Consider
Prompt engineering
Be very detailed in your instructions.
Use prompts with all relevant
information.
Ask for quality answer.
Refine your prompt.
Bias from prompt and from model.
Limited to train knowledge.
Hallucinations
Raw prompt engineering may
struggle in certain use-cases
Add context
Retrieve relevant context
Context placement.
Explore few shot prompting
Usually has a sweet spot for amount
of context (depends on use case,
start with 5 paragraphs / 512
characters each)
Fine-tuning
Build a proper evaluation first.
Start with small models.
Data is important.
Use H2O LLM Studio to quickly iterate
Needs training data.
Beware of fine-tuning for factuality.
Can be great for specific use-cases.
No prompt engineering needed.
RLHF
Model must be fine-tuned with in-
distribution data first.
PPO: requires a lot of high quality data
to train a good reward model.
DPO: loss is calculated directly on
accept/reject pairs.
Next token loss is replaced by a
scalar.
Very research oriented. Proven to be
useful for chat alignment.
PPO (with reward model) or DPO
(without reward model).
Increasing
Complexity
Always max out
the options above
(or already have
training data)
8. v
H2O.ai Confidential
Meetings
Expand Knowledge
Opportunities
Logs & Documents
● Identify work
duplication
● Find new synergies
● Improve staff
motivation by
providing up-to-date
information
● Personalized updates
on any related tasks
● Discover trends with
summaries on
summaries
LLM summarization
Business Intelligence
Automate Information Flow in Your Company
Demos & Presentations
11. H2O.ai Confidential
Solve with Prompt Engineering
In this talk, Andrew Ng discusses how AI can empower businesses of all sizes. He compares the rise of AI to the rise of literacy, explaining that just as literacy was once
limited to a few, AI is currently concentrated in the hands of highly skilled engineers at big tech companies. Ng believes that society can benefit from enabling everyone
to contribute to AI development. He explains that AI projects have been expensive to build, requiring skilled engineers and significant investment. Large tech companies
have been successful in making these investments pay off because they have millions or billions of users to generate revenue from their AI systems. However, this model
does not work for smaller businesses that do not have the same scale or resources. Ng gives examples of how AI can be beneficial to small businesses, such as a pizza
store owner using AI to improve sales by analyzing data and spotting patterns. He emphasizes that AI can work with modest amounts of data and that the real challenge
is the cost of hiring an AI team for small businesses. Ng proposes that the solution lies in new AI development platforms that focus on providing data rather than writing
code. These platforms allow individuals to upload data and train AI systems without extensive coding knowledge. He gives an example of an inspector using a platform to
teach an AI system how to detect defects in fabric. Ng acknowledges that these platforms are still evolving but believes they will become more accessible in the future.
He concludes by stating that democratizing access to AI is crucial for spreading wealth across society. Ng envisions a future where every accountant, store manager,
buyer, and quality inspector can build their own AI systems. He compares the impact of democratizing AI to the impact of widespread literacy, emphasizing the exciting
possibilities it holds for the future.
12. H2O.ai Confidential
Solve with Prompt Engineering
Sure! Here is a summary of Andrew Ng's talk "How AI could empower any business" in a single paragraph:
Andrew Ng discusses the current state of AI, where it is primarily controlled by a select few "high priests
and priestesses" in big tech companies, and how this limits the potential benefits of AI for society as a
whole. He argues that AI has the potential to create tremendous wealth, but only if it is democratized and
made accessible to everyone. To achieve this, he suggests that we need to shift the focus from asking
people to write lots of code to providing data, and highlights the emergence of new AI development
platforms that make this possible. He believes that empowering everyone to build AI systems will lead to
an incredibly exciting future, where small businesses and individuals can use AI to improve their operations
and create new opportunities.
13. H2O.ai Confidential
Solve with Prompt Engineering
Sure! Here's the summary of Andrew Ng's talk "How AI could empower any business" in a single paragraph:
Andrew Ng discusses the potential of AI to empower any business, not just large tech companies. He
highlights the long-tail of AI projects that are valuable but not being addressed, and the need for a
democratic approach to AI development. Ng introduces a new way to build AI systems that focuses on
providing data rather than writing code, making it easier for small businesses and individuals to participate.
He illustrates the potential of these platforms with examples of T-shirt demand forecasting and quality
control, and emphasizes the importance of empowering everyone to build AI systems for themselves to
create a more equitable distribution of wealth.
15. v
H2O.ai Confidential
Maximize baseline score
with prompt engineering
first. Choose maximum
tokens, that most or all
samples fit.
Prompt
Evaluate on sub-epochs
to see if your model
saturates quickly and
overfits. Manually inspect
the output and test some
hand-crafted samples.
Evaluate Often
Even with (Q)LoRA,
learning rate still has a
significant impact on the
training. Start with 0.0001
and AdamW.
Learning Rate
Spend quality time on
what the model is seeing.
Add context? Name, Title,
Text. Experiment with
order and delimiters.
Modify Prompt
Start with LoRA rank 4,
alpha 16, no-dropout.
Apply to all linear layers.
Decrease/increase LoRA
rank & alpha (2-4 * rank),
add a bit of dropout.
LoRA
Only very late, try to scale
up to larger models or
more (diverse) training
data. Also try to scale
down!
Scale Up
What do I Tune First?
So many hyperparameters
17. H2O.ai Confidential
In standard SFT training, Models are
rewarded for predicting the next
token. Factuality makes only a
miniscule fraction of the train loss.
Hallucinations Lack of Reasoning
02
01
They can easily fool us with eloquent
output.
They are only trained on text, and
don’t have deep understanding of
nature, math.
They don’t have any deeper goals or
desires.
…
Models inherit the bias from the
training corpus.
Models may struggle in classes
of applications, e.g. order
inversion, advanced math.
Models only have the
knowledge that they were
trained on (knowledge cutoff).
Models are susceptible to
prompt injection, “jailbreak”
attacks, data poisoning
attacks,…
Beyond Current Limits
A glimpse
18. v
H2O.ai Confidential
Hallucinations
LLMs are not a database search engine. Hallucinations (factual
incorrectness or made up context that “sounds good”) can be
very hard to spot.
By the architecture of LLMs, hallucinations can never be 100%
avoided but there are a few options to mitigate them:
RAG
Knowledge Graphs
Ground the answers by prompting with
high level abstractions and concepts first.
Provide Context
If very low rate of hallucinations is
important for your task, consider using the
best and largest models out there.
Larger/Better Models
Lower temperature or even greedy
decoding will produce more condensed
answers but they will be likely more
factual correct.
Low Temperature
Recommendations:
● Use in low-stakes applications,
combine with human oversight
● Source of inspiration suggestions
● Copilots over autonomous agents
Andrej Karpathy, State of GPT
19. v
H2O.ai Confidential
Reasoning
Fast, automatic, frequent, emotional,
stereotypic, unconscious.
- determine that an object is at a
greater distance than another
- localize the source of a specific
sound
- complete the phrase "war and ..."
- display disgust when seeing a
gruesome image
- solve 2+2=?
- read text on a billboard
- drive a car on an empty road
- understand simple sentences
System 1 (98 % of time)
Slow, effortful, infrequent, logical,
calculating, conscious.
- direct your attention towards the
clowns at the circus
- sustain a faster-than-normal
walking rate
- count the number of A's in a
certain text
- park into a tight parking space
- determine the price/quality ratio
of two washing machines
- determine the validity of a
complex logical reasoning
- solve 17 × 24
System 2 (2 % of time)
20. H2O.ai Confidential
LLMs are stochastic parrots
LLMs do not think, nor have consciousness.
But models can learn internal representations of
states as has been shown with Othello and Chess
(GPT-4 can play at around ELO 1800)
Do LLMs Have an Internal State?
21. v
H2O.ai Confidential
If the task is not solved intuitively …
… consider giving the model time (tokens) to
think about the answer.
Trick Models Into System 2
Chain-of-thought & Reflection
Instruct the model to go through it’s thoughts
step by step. Let the model reflect about its
own answer. While this has been proposed
quite a few times, I personally haven’t seen it
working well and the model is very biased
towards rewoking the original answer.
Decomposition
Decomposition of one task into multiple
smaller ones.
Step-back prompting
Retrieve background information and abstract
the question first. This grounds the answer to
fundamental concepts or principles.