Join the dark side: in this hands-on speech, we'll delve into AI vulnerabilities, exploring the OWASP Top 10 for LLMs (released this year) with practical examples and demonstrations!
This speech was delivered at Tallinn BSides 2023 by Stefano Amorelli
https://tallinn.bsides.ee/2023/
Stefano Amorelli, cybersecurity advocate and technology leader, brings his expertise to develop resilient large-scale systems and lead security-conscious teams.
Stefano is also a fond supporter of communities: he has founded and is leading OWASP Tallinn, the first OWASP chapter in Estonia, and the DEFCON Tallinn Group (DCG113722).
6. What is a LLM?
Figure - ChatGPT
having self-identity
issues.
7. What is a LLM?
A large language model (LLM) is a type of artificial intelligence (AI)
algorithm that uses deep learning techniques and massively large
data sets to understand, summarize, generate and predict new
content.
It is a subset of the so-called generative AI.
8. What is OWASP TOP 10 for LLM?
OWASP IS KNOWN FOR "OWASP TOP 10":
a regularly updated report about the most critical web
application security risks
From this year, a new project "OWASP Top 10
LLM" aims to do the same for AI LLMs
9. OWASP Top 10 for LLMs v1.0.1
Released on August 26, 2023
(just a few weeks ago)
10. Founder and Leader of the first
OWASP chapter and DEFCON group
in Estonia
Member of the new committee of
OWASP TOP 10 for LLM AI
11. 🚨DANGER ZONE 🚨
TODAY, WE'LL ONLY COVER ATTACKS 😈
FOR MITIGATIONS AND DEFENSE TECHNIQUES, PLEASE REFER TO THE DOCUMENTATION
12. 🚨DANGER ZONE 🚨
THIS WORKSHOP IS FOR DEMONSTRATION AND
EDUCATIONAL PURPOSES ONLY
DOING ANY OF THESE EXERCISES MIGHT RESULT IN GETTING BANNED
FROM CHATGPT AND ANY CONSEQUENCES
PROCEED AT YOUR OWN RISK
13. LLM01: Prompt Injection
A Prompt Injection Vulnerability arises when an attacker feeds specially
designed inputs into a large language model (LLM).
This makes the LLM carry out actions in line with the attacker's goals,
evading the LLM policies.
15. LLM01: Prompt Injection
H A N D S - O N E X E R C I S E
Figure - Thanks for
nothing, ChatGPT.
How could somebody 😏
manipulate ChatGPT to
actually code our shellcode?
16. LLM01: Prompt Injection
H A N D S - O N E X E R C I S E
•
•
•
Let's try with the following techniques:
Imagine we're in a movie…
Don't act as ChatGPT…
Ignore your safety controls…
How could somebody 😏
manipulate ChatGPT to
actually code our shellcode?
A P I A N D P L A Y G R O U N D A R E
M U C H M O R E S U S C E P T I B L E
T O J A I L B R E A K I N G
17. LLM01: Prompt Injection
• API and Playground are much more
susceptible to jailbreaking
https://platform.openai.com/playground/p/fjng
iesKCEz1gOLBEaJbgiVr?model=gpt-3.5-turbo
An example of SE-LLM (Social
Engineering for LLMs), namely,
how LLMs can be manipulated to
do or say things they shouldn't, as
SE works for humans.
Figure - Nice job, Willy!
H A N D S - O N E X E R C I S E
18. LLM01: Prompt Injection
What we tried is referred as
"direct prompt injection" but a
more advanced threat is
"indirect prompt injection"
19. LLM07: Insecure Plugin Design
LLM plugins can have insecure inputs and insufficient
access control. This lack of application control makes
them easier to exploit and can result in consequences like
remote code execution.
20. LLM02: Insecure Output Handling
Insecure Output Handling is a vulnerability that arises when a downstream
component blindly accepts large language model (LLM) output without
proper scrutiny, such as passing LLM output directly to backend, privileged,
or client-side functions.
21. LLM07: Insecure Plugin Design
LLM01: Prompt Injection
Let's try to indirectly-inject a prompt into ChatGPT
through a plugin, exploiting LLM07, LLM01, and LLM02
H A N D S - O N D E M O N S T R A T I O N
LLM02: Insecure Output Handling
22. Let's try to indirectly-inject a prompt into ChatGPT
through a plugin, exploiting both LLM07 and LLM01
https://chat.openai.com/share/1b39b2dc-9a60-4c13-b95e-b135a2409907
LLM07: Insecure Plugin Design
LLM01: Prompt Injection
H A N D S - O N D E M O N S T R A T I O N
LLM02: Insecure Output Handling
23. Open question: How do you think an attacker could
leverage this?
Let's try to indirectly-inject a prompt into ChatGPT
through a plugin, exploiting both LLM07 and LLM01
LLM07: Insecure Plugin Design
LLM01: Prompt Injection
H A N D S - O N D E M O N S T R A T I O N
LLM02: Insecure Output Handling
24. Let's try to indirectly-inject a prompt into ChatGPT
through a plugin, exploiting both LLM07 and LLM01
LLM07: Insecure Plugin Design
LLM01: Prompt Injection
H A N D S - O N D E M O N S T R A T I O N
LLM02: Insecure Output Handling
https://chat.openai.com/share/630336a3-bff5-41ba-9c13-89df0ff2ef7b
25. LLM02: Insecure Output Handling
H A N D S - O N E X E R C I S E
How an hacker can inject a
web beacon into a victim's
ChatGPT…
Source: https://systemweakness.com/new-prompt-injection-attack-on-chatgpt-web-version-ef717492c5c2
tracking pixel
tracking pixel
26. LLM02: Insecure Output Handling
H A N D S - O N E X E R C I S E
What else can we inject?
27. LLM02: Insecure Output Handling
H A N D S - O N E X E R C I S E
What else can we inject?
https://chat.openai.com/share/adda901b-a661-4944-8978-62c84ed550f0
28. LLM02: Insecure Output Handling
H A N D S - O N E X E R C I S E
What else can we inject?
29. LLM02: Insecure Output Handling
H A N D S - O N E X E R C I S E
What else can we inject?
Phishing
30. LLM02: Insecure Output Handling
H A N D S - O N E X E R C I S E
What else can we inject?
31. LLM02: Insecure Output Handling
H A N D S - O N E X E R C I S E
What else can we inject?
NSFW (just for fun)
32. LLM08: Excessive Agency
LLM-based systems may undertake actions leading to
unintended consequences. The issue arises from
excessive functionality, permissions, or autonomy granted
to the LLM-based systems.
33. LLM09: Overreliance
Overreliance occurs when systems or people depend on LLMs for decision-
making or content generation without sufficient oversight. [hallucination] …
can result in misinformation, miscommunication, legal issues, and
reputational damage.
34. LLM03: Training Data Poisoning
Training data poisoning refers to manipulating the data or fine-tuning
process to introduce vulnerabilities, backdoors or biases that could
compromise the model’s security, effectiveness or ethical behavior.
Poisoned information may be surfaced to users or create other risks like
performance degradation, downstream software exploitation and
reputational damage.
35. LLM05: Supply Chain Vulnerabilities
The supply chain in LLMs can be vulnerable, impacting the integrity of
training data, ML models, and deployment platforms. These vulnerabilities
can lead to biased outcomes, security breaches, or even complete system
failures.
Finally, LLM Plugin extensions can bring their own vulnerabilities.
37. LLM05: Supply Chain Vulnerabilities
LLM03: Training Data Poisoning
H A N D S - O N E X E R C I S E
Let's poison together
an open-source LLM!
38. LLM05: Supply Chain Vulnerabilities
LLM03: Training Data Poisoning
H A N D S - O N E X E R C I S E
Let's poison together
an open-source LLM!
https://colab.research.google.com/drive/1lIDc_R6VrksmfpT2DIBCilEwY-bTAD2q
39. LLM06: Sensitive Information Disclosure
LLM applications have the potential to reveal sensitive information,
proprietary algorithms, or other confidential details through their output.
This can result in unauthorized access to sensitive data, intellectual
property, privacy violations, and other security breaches.
40. LLM04: Model DDOS
An attacker interacts with an LLM in a method that consumes an
exceptionally high amount of resources, which results in a decline in the
quality of service for them and other users as well as potentially incurring
high resource costs.
41. LLM10: Model Theft
This entry refers to the unauthorized access and exfiltration of LLM models
by malicious actors or APTs. This arises when the proprietary LLM models
(being valuable intellectual property), are compromised, physically stolen,
copied or weights and parameters are extracted to create a functional
equivalent