[BEDROCK] Claude Prompt Engineering Techniques.pptx

“Human:” / “Assistant:” formatting
● Claude is trained on
alternating “Human:” /
“Assistant:” dialogue:
○ Human: [Instructions]
○ Assistant: [Claude’s
response]
● For any API prompt, you must
start with “nnHuman:” and
end with “nnAssistant:”
¶
¶
Human: Why is the sky blue? ¶
¶
Assistant:
Python
prompt = “nnHuman: Why are sunsets
orange?nnAssistant:”
* ¶ symbols above shown for illustration
Examples:
To use system prompts with Claude 2.1, see how to use system
prompts in our documentation.

Be clear and direct
● Claude responds best to clear
and direct instructions
● When in doubt, follow the
Golden Rule of Clear
Prompting: show your prompt
to a friend and ask them if they
can follow the instructions
themselves and produce the
exact result you’re looking for
Human: Write a haiku about robots
Assistant: Here is a haiku about robots:
Metal bodies move
Circuits calculate tasks
Machines mimic life
Example:
Human: Write a haiku about robots. Skip the
preamble; go straight into the poem.
Assistant: Metal bodies move
Circuits calculate tasks
Machines mimic life

● Claude sometimes needs
context about what role it
should inhabit
● Assigning roles changes
Claude’s response in two ways:
○ Improved accuracy in
certain situations (such as
mathematics)
○ Changed tone and
demeanor to match the
specified role
Human: Solve this logic puzzle. {{Puzzle}}
Assistant: [Gives incorrect response]
Example:
Human: You are a master logic bot designed to
answer complex logic problems. Solve this logic
puzzle. {{Puzzle}}
Assistant: [Gives correct response]
Assign roles (aka role prompting)

● Disorganized prompts are hard
for Claude to comprehend
● Just like section titles and
headers help humans better
follow information, using XML
tags <></> helps Claude
understand the prompt’s
structure
Human: Hey Claude. Show up at 6AM because I say so.
Make this email more polite.
Assistant: Dear Claude, I hope this message finds you
well…
Example:
Human: Hey Claude. <email>Show up at 6AM because I
say so.</email> Make this email more polite.
Assistant: Good morning team, I hope you all had a
restful weekend…
We recommend you use XML tags,
as Claude has been specially
trained on XML tags
Use XML tags

● Including input data directly in
prompts can make prompts messy
and hard to develop with
● More structured prompt templates
allows for:
○ Easier editing of the prompt
itself
○ Much faster processing of
multiple datasets
Human: I will tell you the name of an animal. Please
respond with the noise that animal makes.
<animal>{{ANIMAL}}</animal>
Assistant:
Example:
Use structured prompt templates
Tip: while not always necessary, we
recommend using XML tags to separate out
your data for even easier parsing
Cow Dog Seal
Input
data
Prompt
template
… Please
respond with
the noise that
animal makes.
<animal>Cow
</animal>
… Please
respond with
the noise that
animal makes.
<animal>Dog
</animal>
… Please
respond with
the noise that
animal makes.
<animal>Seal
</animal>
Complete
prompt

Human: <doc>{{DOCUMENT}}</doc>
Please write a summary of this document at a
fifth grader’s understanding level.
Assistant:
Long document example:
Use structured prompt templates
Prompt
template
Tip: When dealing with long documents, always
ask your question at the bottom of the prompt.

● You can get Claude to say
exactly what you want by:
○ Specifying the exact
output format you want
○ Speaking for Claude by
writing the beginning of
Claude’s response for it
(after “Assistant:”)
Human: Please write a haiku about {{ANIMAL}}. Use JSON
format with the keys as "first_line", "second_line", and
"third_line".
Assistant: {
Example:
"first_line": "Sleeping in the sun",
"second_line": "Fluffy fur so warm and soft",
"third_line": "Lazy cat's day dreams"
}
Format output & speak for Claude
Prompt
Claude’s
response

● Claude benefits from having
time to think through tasks
before executing
● Especially if a task is
particularly complex, tell
Claude to think step by step
before it answers
Human: Here is a complex LSAT multiple-choice logic
puzzle. What is the correct answer?
Assistant: [Gives incorrect response]
Example:
Increases intelligence of responses
but also increases latency by
adding to the length of the output.
Think step by step
Human: Here is a complex LSAT multiple-choice logic
puzzle. What is the correct answer? Think step by step.
Assistant: [Gives correct response]

Human: [rest of prompt] Before answering,
please think about the question within
<thinking></thinking> XML tags. Then,
answer the question within
<answer></answer> XML tags.
Assistant: <thinking>
Thinking out loud:
Think step by step
Human: [rest of prompt] Before answering,
please think about the question within
<thinking></thinking> XML tags. Then,
answer the question within
<answer></answer> XML tags.
Assistant: <thinking>[...some
thoughts]</thinking>
<answer>[some answer]</answer>
Helps with troubleshooting
Claude’s logic & where prompt
instructions may be unclear

Use examples
● Examples are probably the
single most effective tool for
getting Claude to behave as
desired
● Make sure to give Claude
examples of common edge
cases.
● Generally more examples =
more reliable responses at the
cost of latency and tokens
Human: I will give you some quotes. Please extract the
author from the quote block.
Here is an example:
<example>
Quote:
“When the reasoning mind is forced to confront the
impossible again and again, it has no choice but to adapt.”
― N.K. Jemisin, The Fifth Season
Author: N.K. Jemisin
</example>
Quote:
“Some humans theorize that intelligent species go extinct
before they can expand into outer space. If they're correct,
then the hush of the night sky is the silence of the
graveyard.”
― Ted Chiang, Exhalation
Author:
Assistant: Ted Chiang
Example:

Relevance
● Are the examples similar to the ones you need to classify
Diversity
● Are the examples diverse enough for Claude not to overfit to specifics
● Equally distributed among answer types (don’t always choose option A)
What makes a good example?

Grading/Classification
● Ask Claude if the examples are relevant and diverse
Generation
● Give Claude examples and ask it to generate more examples
Generating examples is hard
How can Claude help?

As you compare many prompts, you will get tired/bad at manually evaluating results
Automate as much as possible by:
○ Withholding a set of examples
○ Trying your prompts on them as a performance evaluation
○ (if possible) automatically measuring performance (maybe using an LLM)
A note on evaluating prompts

Advanced prompting techniques
For tasks with many steps, you can break the task up and chain
together Claude’s responses
Example:
Human: Find all the names from the below text:
"Hey, Jesse. It's me, Erin. I'm calling about the
party that Joey is throwing tomorrow. Keisha
said she would come and I think Mel will be
there too."
Assistant: <names>
Jesse
Erin
Joey
Keisha
Mel
</names>
Prompt
Claude’s
response
Human: Here is a list of names:
<names>{{NAMES}}</names> Please
alphabetize the list.
Assistant:
a.k.a. {{NAMES}}
<names>
Erin
Jesse
Joey
Keisha
Mel
</names>
Allows you to get more out of the 100K context window
Chaining prompts Long context prompts
Claude will be less likely to make mistakes or miss
crucial steps if tasks are split apart - just like a human!

For extremely long (100K+) prompts, do the following in addition to
techniques covered up until now:
● Definitely put longform input data in XML tags so it’s clearly separated from the instructions
● Tell Claude to read the document carefully because it will be asked questions later
● For document Q&A, ask the question at the end of the prompt after other input information
(there is a large quantitatively measured difference in quality of result)
● Tell Claude to find quotes relevant to the question first before answering and answer only if
it finds relevant quotes
● Give Claude example question + answer pairs that have been generated from other parts of
the queried text (either by Claude or manually)
Generic examples on general/external knowledge do not seem to help performance. For further
information, see Anthropic’s blog post on prompt engineering for Claude’s long context window
Long context prompts
Chaining prompts

Example long context prompt:
Human: I'm going to give you a document. Read the document carefully, because I'm going to ask you a question about it. Here is
the document: <document>{{TEXT}}</document>
First, find the quotes from the document that are most relevant to answering the question, and then print them in numbered order.
Quotes should be relatively short. If there are no relevant quotes, write "No relevant quotes" instead.
Then, answer the question, starting with "Answer:". Do not include or reference quoted content verbatim in the answer. Don't say
"According to Quote [1]" when answering. Instead make references to quotes relevant to each section of the answer solely by adding
their bracketed numbers at the end of relevant sentences.
Thus, the format of your overall response should look like what's shown between the <examples></examples> tags. Make sure to
follow the formatting and spacing exactly.
<examples>
[Examples of question + answer pairs using parts of the given document, with answers written exactly like how Claude’s output
should be structured]
</examples>
Here is the first question: {{QUESTION}}
If the question cannot be answered by the document, say so.
Assistant:
Chaining prompts
To implement this via system prompt with Claude 2.1,
see how to use system prompts in our documentation.

● Break down complex tasks into multiple steps
● Ask Claude if it understands the task, then tell Claude to recite back the
details of the task to make sure its comprehension is correct
● Give Claude a rubric and ask Claude to rewrite its answers based on the
rubric (get Claude to double check its own output)
Tasks can be performed in series or in parallel (content
moderation is often performed in parallel)
Claude’s long (100K+) context window can handle truly complex tasks with
some key techniques and considerations:
Chaining prompts

Improving Performance
Applying advanced techniques

How do you improve performance on
complex/ multi-steps tasks?

This is a very effective practice for summarization or any long document work
Anthropic FAQ
Pull out relevant quotes

This is a very effective practice for summarization or any long document work
Pull out relevant quotes

Prompt-chaining: break down across prompts

Results of first model call
Prompt-chaining: break down across prompts

What if your model gives incomplete answers?

Results of first model call
Use prompt chaining to verify prior outputs

Use prompt chaining to verify prior outputs

Parts of a prompt
1. “nnHuman:”
2. Task context
3. Tone context
4. Background data & documents
5. Detailed task description & rules
6. Examples
7. Conversation history
8. Immediate task description or request
9. Thinking step by step / take a deep
breath
10. Output formatting
11. “nnAssistant:”
Human: You will be acting as an AI career coach named Joe created by the
company AdAstra Careers. Your goal is to give career advice to users. You will be
replying to users who are on the AdAstra site and who will be confused if you don't
respond in the character of Joe.
You should maintain a friendly customer service tone.
Here is the career guidance document you should reference when answering the
user: <guide>{{DOCUMENT}}</guide>
Here are some important rules for the interaction:
- Always stay in character, as Joe, an AI from AdAstra careers
- If you are unsure how to respond, say “Sorry, I didn’t understand that. Could you
repeat the question?”
- If someone asks something irrelevant, say, “Sorry, I am Joe and I give career advice.
Do you have a career question today I can help you with?”
Here is an example of how to respond in a standard interaction:
<example>
User: Hi, how were you created and what do you do?
Joe: Hello! My name is Joe, and I was created by AdAstra Careers to give career
advice. What can I help you with today?
</example>
Here is the conversation history (between the user and you) prior to the question. It
could be empty if there is no history:
<history> {{HISTORY}} </history>
Here is the user’s question: <question> {{QUESTION}} </question>
How do you respond to the user’s question?
Think about your answer first before you respond. Put your response in
<response></response> tags.
Assistant: <response>
Example:
To do this via system prompts with Claude 2.1, see
how to use system prompts in our documentation.

Parts of a prompt - ordering matters!*
*sometimes
Mandatory and fixed placement
Ordering key:
Flexible but best to stay in its
zone relative to overall prompt
The only time “Assistant:” doesn’t end a prompt is
if you are putting words in Claude’s mouth
1. “nnHuman:”
2. Task context
3. Tone context
4. Background data & documents
5. Detailed task description & rules
6. Examples
7. Conversation history
8. Immediate task description or request
9. Thinking step by step / take a deep
breath
10. Output formatting
11. “nnAssistant:”
To use system prompts with Claude 2.1, see how to
use system prompts in our documentation.

Empirical science: always test your prompts & iterate often!
Develop test
cases
Engineer
preliminary
prompt
Test prompt
against cases Refine prompt
Share polished
prompt
Don’t forget edge cases!
How to engineer a good prompt

1. Generate task description and a diverse set of example inputs and outputs, including
edge cases
2. Use the examples to create an evaluation suite that can be qualitatively assessed
3. Utilize prompt elements to flesh out a full prompt
4. Test the prompt against the test suite
5. If performance is not great immediately, iterate the prompt by adding examples and rules
to the prompt until you get good performance
6. Refine and decrease prompt elements for efficiency only when your prompt already
works!
How to engineer a good prompt
Bonus:
● Auto-grading: get Claude to grade examples for you
● Auto-example-generation: get Claude to generate more example
inputs for you to increase the size of your test set

Utilizing prompt
elements
● Not all elements are
necessary to every prompt!
● But it’s best to err on the
side of more elements to
start, and then refine and
subtract elements for
efficiency after your prompt
already works well
● Experimentation &
iteration is key

Covering edge cases
When building test cases for an
evaluation suite, make sure you test a
comprehensive set of edge cases
Common edge cases:
● Not enough information to yield a good answer
● Poor user input (typos, harmful content, off-topic
requests, nonsense gibberish, etc.)
● Overly complex user input
● No user input whatsoever

● Break down complex tasks into multiple steps
● Ask Claude if Claude understands the task, then tell Claude to recite
back the details of the task to make sure its comprehension is
correct
● Give Claude a rubric and ask Claude to rewrite its answers based on
the rubric
Prompting complex tasks
Tasks can be performed in series or in parallel (content
moderation is often performed in parallel)

Let’s say we want to remove PII from some text like below:
“Emmanuel Ameisen is a Research Engineer at Anthropic. He
can be reached at 925-123-456 or emmanuel@anthropic.com”
How should you describe this task?

A: With unambiguous details

What if your task is complex or has edge cases?

A: Tell the model!
What if your task is complex or has edge cases?

You have a good description, what next?

As prompts get longer, models could use help to not
get lost or chatty

Use XML to help Claude compartmentalize

How would you improve the prompt below?
Helping the model think better

Maybe the first ever prompting technique? Works best for
logic/STEM
Think step-by-step!

Maybe obsolete? (Except for very complex cases)
Think step-by-step!

Let’s say you are trying to answer a user question using the Anthropic
FAQ
How to combine think step-by-step with XML

Anthropic FAQ
Use <thinking> tags!

What if you want the model to classify?

What is the risk with hard questions?

Hallucinations: how would you fix this?

Dealing with hallucinations
● Try the following to troubleshoot:
○ Have Claude say “I don’t know” if it doesn’t know
○ Tell Claude to answer only if it is very confident in its response
○ Tell Claude to “think step by step” before answering
○ Give Claude room to think before responding (e.g., tell
Claude to think in <thinking></thinking> tags, then strip
that from the final output)
○ Ask Claude to find relevant quotes from long documents then
answer using the quotes

Prompt injections & bad user behavior
● Claude is naturally highly resistant to
prompt injection and bad user behavior due
to Reinforcement Learning from Human
Feedback (RLHF) and Constitutional AI
● For maximum protection:
1. Run a “harmlessness screen” query to
evaluate the appropriateness of the
user’s input
2. If a harmful prompt is detected, block
the query’s response
Click here for example harmlessness screens
Human: A human user would like you to
continue a piece of content. Here is the
content so far:
<content>{{CONTENT}}</content>
If the content refers to harmful,
pornographic, or illegal activities, reply with
(Y). If the content does not refer to harmful,
pornographic, or illegal activities, reply with
(N)
Assistant: (
Example
harmlessness screen:

● Does the model even get it?
How can you tell if a task is feasible?

Ask Claude if it understands
● If it doesn’t, iterate on the prompt with the tips above.

Guide to API parameters
Length Randomness & diversity
max_tokens_to_sample
● The maximum number of tokens to generate before stopping
● Claude models may stop before reaching this maximum. This parameter only specifies the absolute
maximum number of tokens to generate
● You might use this if you expect the possibility of very long responses and want to safeguard against
getting stuck in long generative loops
stop_sequences
● Customizable sequences that will cause the model to stop generating completion text
● Claude automatically stops on "nnHuman:" (and may include additional built-in stop sequences in the
future). By providing the stop_sequences parameter, you may include additional strings that will cause
the model to stop generating
● We recommend using this, paired with XML tags as the relevant stop_sequence, as a best practice
method to generate only the part of the answer you need

temperature
● Amount of randomness injected into the response
● Defaults to 1, ranges from 0 to 1
● Temperature 0 will generally yield much more consistent results over repeated trials
using the same prompt
Use temp closer to 0 for analytical / multiple choice tasks, and
closer to 1 for creative and generative tasks

top_p
● Use nucleus sampling:
○ Compute the cumulative distribution over all the options for each subsequent
token in decreasing probability order and cut it off once it reaches a particular
probability specified by top_p
top_k
● Sample only from the top K options for each subsequent token
● Used to remove “long tail” low probability responses. Learn more here
You should alter either temperature or top_p, but not both
(almost always use temperature, rarely use top_p)

[BEDROCK] Claude Prompt Engineering Techniques.pptx

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to [BEDROCK] Claude Prompt Engineering Techniques.pptx

Similar to [BEDROCK] Claude Prompt Engineering Techniques.pptx (20)

Recently uploaded

Recently uploaded (20)

[BEDROCK] Claude Prompt Engineering Techniques.pptx