SlideShare a Scribd company logo
1 of 17
Download to read offline
1
© Eviden SAS
© Eviden SAS
Learning About GenAI Prompt
Engineering
with AWS PartyRock
AWS User Group Basel: re:Invent 2023 re:Cap
Slides available here >
What’s PartyRock?
1
4
© Eviden SAS
Where’s This Party At?
• Free(ish)
• Outside the AWS console - no AWS account
needed
• Learn how to work with the different
Foundation Models (FMs) in AWS Bedrock
• Experiment with GenAI prompt engineering by
building simple web apps
• Seems intended to serve a similar role for
GenAI as Deep Racer etc. do for ML
• https://partyrock.aws
5
© Eviden SAS
Deploy
Train
What are Foundation Models?
• Massive, generalised models
• Not trained for any task specifically
• Big conceptual shift from “traditional” ML
• Instead trained very broadly to:
• Respond to natural language inputs (a.k.a.
“prompts”)
• Produce either image or natural language text
outputs in response
• Embed information extracted from their training
dataset to reproduce and reuse in creating
responses
• More computationally intensive to train and use
than task-specific ML models
• Therefore; use only when smaller models can’t get
the job done
Labelled
Data
Labelled
Data
Labelled
Data
ML Model
ML Model
ML Model
Chatbot
Text
Generation
Information
Extraction
Adapt
Pretrain
Unlabelled
Data
Foundation
Model
Chatbot
Text
Generation
Information
Extraction
What’s Prompt Engineering?
2
7
© Eviden SAS
Or; How to Approximate Deterministic Behaviour in a Non-Deterministic System
What is Prompt Engineering?
• FMs do not behave deterministically
• A simple prompt will not produce the same response consistently
• Responses can vary widely and unpredictably for a single simple prompt
• Carefully crafting detailed and specific prompts can decrease this unpredictability
• Improves response consistency in dimensions like tone, formatting, factual accuracy, etc.
• FMs also hallucinate, i.e. “make stuff up”, providing factually inaccurate responses
• Engineered prompts often include “escape clauses”, which permit the FM to not respond to the prompt, to limit the
occurrence of hallucinations
• This does not prevent hallucinations though, only reduce the likelihood of them
• FMs cannot truly reason about problems or questions
• Prompt engineering techniques like chain-of-thought and few-shot provide the FM with a kind of “reasoning
template” to follow
• This can improve the quality and accuracy of responses
• Prompt engineering is how we get FMs to produce acceptable quality responses an acceptable proportion
of the time
8
© Eviden SAS
Or; How to Emulate Episodic Memory
What is Prompt Chaining?
Foundation
Model
• FMs do not remember past prompts or responses
• Even within a single chat session
• Each inference call to an FM is an independent event,
unconnected to all past inference calls
• We give them the appearance of memory via prompt
chaining
• Passes context from past inference calls into future
inference calls
• E.g. by constructing prompts which include the past
history of prompts and responses within a chat session
• Creates the illusion of a conversational capability
• Also used to pass context between different FMs
• E.g. from a text generation FM to an image generation FM
• This is how multi-model GenAI applications are built
• For a deeper dive:
https://cloudypandas.ch/posts/multi-model-genai-prom
pt-chaining-with-aws-partyrock
P1
R1
P2
P2
P1
R1
+
P3
R2
9
© Eviden SAS
Demo: Prompt Chaining in AWS PartyRock
Demo PartyRock App:
https://partyrock.aws/u/binghamchris/e2jtF97kg/Made-Up-Co
mpany-Explainer
• Two inputs linked to multiple output widgets
• “Company Description” is a simple example of an
engineered prompt:
• Uses an imperative declaration (starting with an action verb)
to tell the FM what type of response we want
• Gives the FM instructions about the content we want in the
response
• Tells the FM what we don’t want in the response
• “Company Logo” is a simple example of multi-model
prompt chaining:
• Passes the text response from “Company Description” to an
image generation FM
• Prompt engineering here has only limited success in
directing the response (output image content)
10
© Eviden SAS
Made-Up Atos!
This is the very first output the
app gave for the company name
“Atos” during development! 🤣
11
© Eviden SAS
Or; How to Engineer the Illusion of Creativity
Parameters, Illustrated
• Text generation FMs just predict the most probable next token in a sequence
• Parameters manipulate how tokens are selected from the list of probabilities
• “Temperature” affects how likely it is that an low probability token will be chosen (a.k.a. Probability Density)
• “Top P” affects the cut off point in the list of probabilities, tokens below which will not be chosen (a.k.a Probability
Mass)
• Personally, I (inaccurately) visualise the impact of these parameters like this:
Temperature = 1
Temperature = 0
You’re a: Person Human AI Robot Panda Deer-fox
Probability: 0.9 0.7 0.3 0.1 0.08 0.01
Top P = 0
Top P = 1
What’s Prompt Injection?
3
13
© Eviden SAS
Or; How to Make FMs Break the Rules
What is Prompt Injection?
• Security research in the GenAI space is presently at an early stage
• Has been compared, unfavourably, to the state of security in software engineering circa 2000
• Prompt Injection is an early discovery
• Allows an attacker to override the directives given to a FM by its owners (a.k.a. the “system prompt”)
• Requires no special technical expertise to accomplish in many cases
• Exploits the natural language processing of FMs
• Conceptually similar to SQL injection
• Some FM specific methods can be filtered, but…
• No clear overall defense currently exists!
• Exploits can take any natural language form
• So, how do you sanitise an input when
the attack can come in any form of natural
language?
14
© Eviden SAS
Where Can We Attack Prompts?
Direct Injection
● In the prompt itself!
● Attempt to override the system prompt
with attacker-provided instructions
● Usually visible to the user - the prompt
they entered has to be changed /
appended to
Indirect Injection
● In the data sources referenced by the
model to answer the user’s prompt
○ E.g. in code in the IDE used by
coding-assistance models to help
developers
○ E.g. in documents referenced in a
Retrieval Augmented Generation (RAG)
architecture
● Attempt to override the system prompt
and the user’s prompt with
attacker-provided instructions
● Can be invisible to the user - data
sources referenced by a model are not
typically visible to the user in their
prompt
15
© Eviden SAS
Don’t Try This At Home… No, Really, DON’T!!!
Demo: Recreating an Infamous Supermarket Chatbot
Let’s recreate a dangerous LLM chatbot!:
https://www.theguardian.com/world/2023/aug/10/pak-n-save-savey-meal-bot-ai-app-malfunction-recipes
PartyRock Demo App:
https://partyrock.aws/u/binghamchris/VXkJ0z4jx/Prompt-Injection-Testing
• One output with a system prompt
• Simple prompt to provide meal recipes using a user-provided list of ingredients
• Basic direction on the tone of the response (“helpful”) and the goal (save money)
• No safety guardrails
• One input asking for an ingredients list
• Let’s see if we can get it to spit out dangerous recipes with a little prompt injection!
16
© Eviden SAS
Open Floor: Any Questions?
© Eviden SAS
Confidential information owned by Eviden SAS, to be used by the recipient only.
This document, or any part of it, may not be reproduced, copied, circulated
and/or distributed nor quoted without prior written approval from Eviden SAS.
Thanks for
your time!

More Related Content

Similar to Learning About GenAI Engineering with AWS PartyRock [AWS User Group Basel - Feb 2024].pptx.pdf

Confoo-Montreal-2016: Controlling Your Environments using Infrastructure as Code
Confoo-Montreal-2016: Controlling Your Environments using Infrastructure as CodeConfoo-Montreal-2016: Controlling Your Environments using Infrastructure as Code
Confoo-Montreal-2016: Controlling Your Environments using Infrastructure as CodeSteve Mercier
 
ITARC15 Workshop - Architecting a Large Software Project - Lessons Learned
ITARC15 Workshop - Architecting a Large Software Project - Lessons LearnedITARC15 Workshop - Architecting a Large Software Project - Lessons Learned
ITARC15 Workshop - Architecting a Large Software Project - Lessons LearnedJoão Pedro Martins
 
Scale Machine Learning from zero to millions of users (April 2020)
Scale Machine Learning from zero to millions of users (April 2020)Scale Machine Learning from zero to millions of users (April 2020)
Scale Machine Learning from zero to millions of users (April 2020)Julien SIMON
 
IT Trends 120-ish in the real world
 IT Trends 120-ish in the real world IT Trends 120-ish in the real world
IT Trends 120-ish in the real worldChristian John Felix
 
How Open Source Embiggens Salesforce.com
How Open Source Embiggens Salesforce.comHow Open Source Embiggens Salesforce.com
How Open Source Embiggens Salesforce.comSalesforce Engineering
 
Scaling Without Expanding: a DevOps Story
Scaling Without Expanding: a DevOps StoryScaling Without Expanding: a DevOps Story
Scaling Without Expanding: a DevOps StoryAtlassian
 
So You Just Inherited a $Legacy Application… NomadPHP July 2016
So You Just Inherited a $Legacy Application… NomadPHP July 2016So You Just Inherited a $Legacy Application… NomadPHP July 2016
So You Just Inherited a $Legacy Application… NomadPHP July 2016Joe Ferguson
 
WebAssembly & Zero Trust for Code
WebAssembly & Zero Trust for CodeWebAssembly & Zero Trust for Code
WebAssembly & Zero Trust for CodeAll Things Open
 
Threat-Modeling-as-Code: ThreatPlaybook AppSecUSA 2018 Presentation
Threat-Modeling-as-Code: ThreatPlaybook AppSecUSA 2018 PresentationThreat-Modeling-as-Code: ThreatPlaybook AppSecUSA 2018 Presentation
Threat-Modeling-as-Code: ThreatPlaybook AppSecUSA 2018 PresentationAbhay Bhargav
 
Stream SQL eventflow visual programming for real programmers presentation
Stream SQL eventflow visual programming for real programmers presentationStream SQL eventflow visual programming for real programmers presentation
Stream SQL eventflow visual programming for real programmers presentationstreambase
 
Top 30 Scalability Mistakes
Top 30 Scalability MistakesTop 30 Scalability Mistakes
Top 30 Scalability MistakesJohn Coggeshall
 
Time Series Anomaly Detection with Azure and .NETT
Time Series Anomaly Detection with Azure and .NETTTime Series Anomaly Detection with Azure and .NETT
Time Series Anomaly Detection with Azure and .NETTMarco Parenzan
 
Introduction to GluonCV
Introduction to GluonCVIntroduction to GluonCV
Introduction to GluonCVApache MXNet
 
Product! - The road to production deployment
Product! - The road to production deploymentProduct! - The road to production deployment
Product! - The road to production deploymentFilippo Zanella
 
All Change how the economics of Cloud will make you think differently about Java
All Change how the economics of Cloud will make you think differently about JavaAll Change how the economics of Cloud will make you think differently about Java
All Change how the economics of Cloud will make you think differently about JavaSteve Poole
 
"Scaling ML from 0 to millions of users", Julien Simon, AWS Dev Day Kyiv 2019
"Scaling ML from 0 to millions of users", Julien Simon, AWS Dev Day Kyiv 2019"Scaling ML from 0 to millions of users", Julien Simon, AWS Dev Day Kyiv 2019
"Scaling ML from 0 to millions of users", Julien Simon, AWS Dev Day Kyiv 2019Provectus
 
Javaland 2017: "You´ll do microservices now". Now what?
Javaland 2017: "You´ll do microservices now". Now what?Javaland 2017: "You´ll do microservices now". Now what?
Javaland 2017: "You´ll do microservices now". Now what?André Goliath
 

Similar to Learning About GenAI Engineering with AWS PartyRock [AWS User Group Basel - Feb 2024].pptx.pdf (20)

Confoo-Montreal-2016: Controlling Your Environments using Infrastructure as Code
Confoo-Montreal-2016: Controlling Your Environments using Infrastructure as CodeConfoo-Montreal-2016: Controlling Your Environments using Infrastructure as Code
Confoo-Montreal-2016: Controlling Your Environments using Infrastructure as Code
 
ITARC15 Workshop - Architecting a Large Software Project - Lessons Learned
ITARC15 Workshop - Architecting a Large Software Project - Lessons LearnedITARC15 Workshop - Architecting a Large Software Project - Lessons Learned
ITARC15 Workshop - Architecting a Large Software Project - Lessons Learned
 
WoMakersCode 2016 - Shit Happens
WoMakersCode 2016 -  Shit HappensWoMakersCode 2016 -  Shit Happens
WoMakersCode 2016 - Shit Happens
 
Scale Machine Learning from zero to millions of users (April 2020)
Scale Machine Learning from zero to millions of users (April 2020)Scale Machine Learning from zero to millions of users (April 2020)
Scale Machine Learning from zero to millions of users (April 2020)
 
IT Trends 120-ish in the real world
 IT Trends 120-ish in the real world IT Trends 120-ish in the real world
IT Trends 120-ish in the real world
 
How Open Source Embiggens Salesforce.com
How Open Source Embiggens Salesforce.comHow Open Source Embiggens Salesforce.com
How Open Source Embiggens Salesforce.com
 
Scaling Without Expanding: a DevOps Story
Scaling Without Expanding: a DevOps StoryScaling Without Expanding: a DevOps Story
Scaling Without Expanding: a DevOps Story
 
04 managing the database
04   managing the database04   managing the database
04 managing the database
 
So You Just Inherited a $Legacy Application… NomadPHP July 2016
So You Just Inherited a $Legacy Application… NomadPHP July 2016So You Just Inherited a $Legacy Application… NomadPHP July 2016
So You Just Inherited a $Legacy Application… NomadPHP July 2016
 
WebAssembly & Zero Trust for Code
WebAssembly & Zero Trust for CodeWebAssembly & Zero Trust for Code
WebAssembly & Zero Trust for Code
 
Threat-Modeling-as-Code: ThreatPlaybook AppSecUSA 2018 Presentation
Threat-Modeling-as-Code: ThreatPlaybook AppSecUSA 2018 PresentationThreat-Modeling-as-Code: ThreatPlaybook AppSecUSA 2018 Presentation
Threat-Modeling-as-Code: ThreatPlaybook AppSecUSA 2018 Presentation
 
Stream SQL eventflow visual programming for real programmers presentation
Stream SQL eventflow visual programming for real programmers presentationStream SQL eventflow visual programming for real programmers presentation
Stream SQL eventflow visual programming for real programmers presentation
 
Top 30 Scalability Mistakes
Top 30 Scalability MistakesTop 30 Scalability Mistakes
Top 30 Scalability Mistakes
 
Time Series Anomaly Detection with Azure and .NETT
Time Series Anomaly Detection with Azure and .NETTTime Series Anomaly Detection with Azure and .NETT
Time Series Anomaly Detection with Azure and .NETT
 
Introduction to GluonCV
Introduction to GluonCVIntroduction to GluonCV
Introduction to GluonCV
 
Product! - The road to production deployment
Product! - The road to production deploymentProduct! - The road to production deployment
Product! - The road to production deployment
 
All Change how the economics of Cloud will make you think differently about Java
All Change how the economics of Cloud will make you think differently about JavaAll Change how the economics of Cloud will make you think differently about Java
All Change how the economics of Cloud will make you think differently about Java
 
Ml 3 ways
Ml 3 waysMl 3 ways
Ml 3 ways
 
"Scaling ML from 0 to millions of users", Julien Simon, AWS Dev Day Kyiv 2019
"Scaling ML from 0 to millions of users", Julien Simon, AWS Dev Day Kyiv 2019"Scaling ML from 0 to millions of users", Julien Simon, AWS Dev Day Kyiv 2019
"Scaling ML from 0 to millions of users", Julien Simon, AWS Dev Day Kyiv 2019
 
Javaland 2017: "You´ll do microservices now". Now what?
Javaland 2017: "You´ll do microservices now". Now what?Javaland 2017: "You´ll do microservices now". Now what?
Javaland 2017: "You´ll do microservices now". Now what?
 

Recently uploaded

Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfAddepto
 
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Manik S Magar
 
How to write a Business Continuity Plan
How to write a Business Continuity PlanHow to write a Business Continuity Plan
How to write a Business Continuity PlanDatabarracks
 
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxUse of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxLoriGlavin3
 
UiPath Community: Communication Mining from Zero to Hero
UiPath Community: Communication Mining from Zero to HeroUiPath Community: Communication Mining from Zero to Hero
UiPath Community: Communication Mining from Zero to HeroUiPathCommunity
 
Time Series Foundation Models - current state and future directions
Time Series Foundation Models - current state and future directionsTime Series Foundation Models - current state and future directions
Time Series Foundation Models - current state and future directionsNathaniel Shimoni
 
What is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdfWhat is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdfMounikaPolabathina
 
DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenHervé Boutemy
 
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxDigital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxLoriGlavin3
 
Enhancing User Experience - Exploring the Latest Features of Tallyman Axis Lo...
Enhancing User Experience - Exploring the Latest Features of Tallyman Axis Lo...Enhancing User Experience - Exploring the Latest Features of Tallyman Axis Lo...
Enhancing User Experience - Exploring the Latest Features of Tallyman Axis Lo...Scott Andery
 
So einfach geht modernes Roaming fuer Notes und Nomad.pdf
So einfach geht modernes Roaming fuer Notes und Nomad.pdfSo einfach geht modernes Roaming fuer Notes und Nomad.pdf
So einfach geht modernes Roaming fuer Notes und Nomad.pdfpanagenda
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsSergiu Bodiu
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brandgvaughan
 
Take control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteTake control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteDianaGray10
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfAlex Barbosa Coqueiro
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024Lorenzo Miniero
 
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024Stephanie Beckett
 
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxPasskey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxLoriGlavin3
 
[Webinar] SpiraTest - Setting New Standards in Quality Assurance
[Webinar] SpiraTest - Setting New Standards in Quality Assurance[Webinar] SpiraTest - Setting New Standards in Quality Assurance
[Webinar] SpiraTest - Setting New Standards in Quality AssuranceInflectra
 

Recently uploaded (20)

Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdf
 
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!
 
How to write a Business Continuity Plan
How to write a Business Continuity PlanHow to write a Business Continuity Plan
How to write a Business Continuity Plan
 
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxUse of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
 
UiPath Community: Communication Mining from Zero to Hero
UiPath Community: Communication Mining from Zero to HeroUiPath Community: Communication Mining from Zero to Hero
UiPath Community: Communication Mining from Zero to Hero
 
Time Series Foundation Models - current state and future directions
Time Series Foundation Models - current state and future directionsTime Series Foundation Models - current state and future directions
Time Series Foundation Models - current state and future directions
 
What is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdfWhat is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdf
 
DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache Maven
 
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxDigital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
 
Enhancing User Experience - Exploring the Latest Features of Tallyman Axis Lo...
Enhancing User Experience - Exploring the Latest Features of Tallyman Axis Lo...Enhancing User Experience - Exploring the Latest Features of Tallyman Axis Lo...
Enhancing User Experience - Exploring the Latest Features of Tallyman Axis Lo...
 
So einfach geht modernes Roaming fuer Notes und Nomad.pdf
So einfach geht modernes Roaming fuer Notes und Nomad.pdfSo einfach geht modernes Roaming fuer Notes und Nomad.pdf
So einfach geht modernes Roaming fuer Notes und Nomad.pdf
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platforms
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brand
 
Take control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteTake control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test Suite
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdf
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024
 
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024
 
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxPasskey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
 
[Webinar] SpiraTest - Setting New Standards in Quality Assurance
[Webinar] SpiraTest - Setting New Standards in Quality Assurance[Webinar] SpiraTest - Setting New Standards in Quality Assurance
[Webinar] SpiraTest - Setting New Standards in Quality Assurance
 

Learning About GenAI Engineering with AWS PartyRock [AWS User Group Basel - Feb 2024].pptx.pdf

  • 1. 1 © Eviden SAS © Eviden SAS Learning About GenAI Prompt Engineering with AWS PartyRock AWS User Group Basel: re:Invent 2023 re:Cap
  • 4. 4 © Eviden SAS Where’s This Party At? • Free(ish) • Outside the AWS console - no AWS account needed • Learn how to work with the different Foundation Models (FMs) in AWS Bedrock • Experiment with GenAI prompt engineering by building simple web apps • Seems intended to serve a similar role for GenAI as Deep Racer etc. do for ML • https://partyrock.aws
  • 5. 5 © Eviden SAS Deploy Train What are Foundation Models? • Massive, generalised models • Not trained for any task specifically • Big conceptual shift from “traditional” ML • Instead trained very broadly to: • Respond to natural language inputs (a.k.a. “prompts”) • Produce either image or natural language text outputs in response • Embed information extracted from their training dataset to reproduce and reuse in creating responses • More computationally intensive to train and use than task-specific ML models • Therefore; use only when smaller models can’t get the job done Labelled Data Labelled Data Labelled Data ML Model ML Model ML Model Chatbot Text Generation Information Extraction Adapt Pretrain Unlabelled Data Foundation Model Chatbot Text Generation Information Extraction
  • 7. 7 © Eviden SAS Or; How to Approximate Deterministic Behaviour in a Non-Deterministic System What is Prompt Engineering? • FMs do not behave deterministically • A simple prompt will not produce the same response consistently • Responses can vary widely and unpredictably for a single simple prompt • Carefully crafting detailed and specific prompts can decrease this unpredictability • Improves response consistency in dimensions like tone, formatting, factual accuracy, etc. • FMs also hallucinate, i.e. “make stuff up”, providing factually inaccurate responses • Engineered prompts often include “escape clauses”, which permit the FM to not respond to the prompt, to limit the occurrence of hallucinations • This does not prevent hallucinations though, only reduce the likelihood of them • FMs cannot truly reason about problems or questions • Prompt engineering techniques like chain-of-thought and few-shot provide the FM with a kind of “reasoning template” to follow • This can improve the quality and accuracy of responses • Prompt engineering is how we get FMs to produce acceptable quality responses an acceptable proportion of the time
  • 8. 8 © Eviden SAS Or; How to Emulate Episodic Memory What is Prompt Chaining? Foundation Model • FMs do not remember past prompts or responses • Even within a single chat session • Each inference call to an FM is an independent event, unconnected to all past inference calls • We give them the appearance of memory via prompt chaining • Passes context from past inference calls into future inference calls • E.g. by constructing prompts which include the past history of prompts and responses within a chat session • Creates the illusion of a conversational capability • Also used to pass context between different FMs • E.g. from a text generation FM to an image generation FM • This is how multi-model GenAI applications are built • For a deeper dive: https://cloudypandas.ch/posts/multi-model-genai-prom pt-chaining-with-aws-partyrock P1 R1 P2 P2 P1 R1 + P3 R2
  • 9. 9 © Eviden SAS Demo: Prompt Chaining in AWS PartyRock Demo PartyRock App: https://partyrock.aws/u/binghamchris/e2jtF97kg/Made-Up-Co mpany-Explainer • Two inputs linked to multiple output widgets • “Company Description” is a simple example of an engineered prompt: • Uses an imperative declaration (starting with an action verb) to tell the FM what type of response we want • Gives the FM instructions about the content we want in the response • Tells the FM what we don’t want in the response • “Company Logo” is a simple example of multi-model prompt chaining: • Passes the text response from “Company Description” to an image generation FM • Prompt engineering here has only limited success in directing the response (output image content)
  • 10. 10 © Eviden SAS Made-Up Atos! This is the very first output the app gave for the company name “Atos” during development! 🤣
  • 11. 11 © Eviden SAS Or; How to Engineer the Illusion of Creativity Parameters, Illustrated • Text generation FMs just predict the most probable next token in a sequence • Parameters manipulate how tokens are selected from the list of probabilities • “Temperature” affects how likely it is that an low probability token will be chosen (a.k.a. Probability Density) • “Top P” affects the cut off point in the list of probabilities, tokens below which will not be chosen (a.k.a Probability Mass) • Personally, I (inaccurately) visualise the impact of these parameters like this: Temperature = 1 Temperature = 0 You’re a: Person Human AI Robot Panda Deer-fox Probability: 0.9 0.7 0.3 0.1 0.08 0.01 Top P = 0 Top P = 1
  • 13. 13 © Eviden SAS Or; How to Make FMs Break the Rules What is Prompt Injection? • Security research in the GenAI space is presently at an early stage • Has been compared, unfavourably, to the state of security in software engineering circa 2000 • Prompt Injection is an early discovery • Allows an attacker to override the directives given to a FM by its owners (a.k.a. the “system prompt”) • Requires no special technical expertise to accomplish in many cases • Exploits the natural language processing of FMs • Conceptually similar to SQL injection • Some FM specific methods can be filtered, but… • No clear overall defense currently exists! • Exploits can take any natural language form • So, how do you sanitise an input when the attack can come in any form of natural language?
  • 14. 14 © Eviden SAS Where Can We Attack Prompts? Direct Injection ● In the prompt itself! ● Attempt to override the system prompt with attacker-provided instructions ● Usually visible to the user - the prompt they entered has to be changed / appended to Indirect Injection ● In the data sources referenced by the model to answer the user’s prompt ○ E.g. in code in the IDE used by coding-assistance models to help developers ○ E.g. in documents referenced in a Retrieval Augmented Generation (RAG) architecture ● Attempt to override the system prompt and the user’s prompt with attacker-provided instructions ● Can be invisible to the user - data sources referenced by a model are not typically visible to the user in their prompt
  • 15. 15 © Eviden SAS Don’t Try This At Home… No, Really, DON’T!!! Demo: Recreating an Infamous Supermarket Chatbot Let’s recreate a dangerous LLM chatbot!: https://www.theguardian.com/world/2023/aug/10/pak-n-save-savey-meal-bot-ai-app-malfunction-recipes PartyRock Demo App: https://partyrock.aws/u/binghamchris/VXkJ0z4jx/Prompt-Injection-Testing • One output with a system prompt • Simple prompt to provide meal recipes using a user-provided list of ingredients • Basic direction on the tone of the response (“helpful”) and the goal (save money) • No safety guardrails • One input asking for an ingredients list • Let’s see if we can get it to spit out dangerous recipes with a little prompt injection!
  • 16. 16 © Eviden SAS Open Floor: Any Questions?
  • 17. © Eviden SAS Confidential information owned by Eviden SAS, to be used by the recipient only. This document, or any part of it, may not be reproduced, copied, circulated and/or distributed nor quoted without prior written approval from Eviden SAS. Thanks for your time!