AI-ttacks - Nghiên cứu về một số tấn công vào các mô hình học máy và AI

AI/ML
under Attack
SECURITY BOOTCAMP
N/A
Manhnho

NỘI DUNG
01 LLM PROMPT INJECTION
03
02 04 Q & A
ATTACK ML MODEL
AI, ML TODAY
https://cypeace.net/

AI, ML TODAY

ATTACK ML MODEL

ML pipeline
development
Learning parameters
Training data Model Test Output
Test Input
Learning algorithm

Attacks on the ML pipeline
Training Data attack
Training set
poisoning
Adversarial Examples
Model
theft
Learning parameters
Test Input
Learning
algorithm

Poisoning
Attack
Training set
poisoning
Model
theft
Learning parameters
Test Input
Learning
algorithm

Target
Poisoning
Attack
Bias induction Backdoor insertion
Disruption
Competitive sabotage
Ransomware

Poisoning
Attack
Classified based on
outcomes
• Targeted attacks
• Untargeted attacks
Classified based on
the approach follow
• Backdoor attacks
• Clean-label attacks

Simple Poisoning
Attack
Learning parameters
Test Input
Learning
algorithm

Backdoor Poisoning
Attack
Backdoor
Poisoning
Attack
• Single pixel
• Pattern of pixels
• Imange insert

Model Tampering
Training set
poisoning
Model
theft
Learning parameters
Test Input
Learning
algorithm

Model Tampering
Model Tampering
Exploiting Pickle
Serialization
Injecting Trojan Horses
Neural Payload
Injection
Model
Hijacking
Model
Reprogramming

Exploiting Pickle Serialization

Injecting Trojan horses - Keras layers

Injecting Trojan horses
- Keras Lambda layers

Injecting Trojan horses
- Keras Custom layers

Evasion
Attack
Training set
poisoning
Model
theft
Learning parameters
Test Input
Learning
algorithm

Method
• Fast Gradient Sign Method (FGSM)
• Basic Iterative Method (BIM)
• Projected Gradient Descent (PGD)
• Carlini and Wagner (C&W)
• Jacobian-based Saliency Map Attack
(JSMA)
Evasion
Attack

https://kennysong.github.io/adversarial.js/
Evasion
Attack

Model Extraction
Attacks
Training set
poisoning
Model
theft
Learning parameters
Test Input
Learning
algorithm

Functionally
Equivalent
Extraction
Model Extraction
Attacks

Learning-Based Model
Extraction Attacks
Copycat CNN
KnockOff Nets
Model Extraction
Attacks

Generative
Student-Teacher
Extraction (Distillation) Attacks
Model Extraction
Attacks

Jupyter Notebook demo
https://github.com/anhtn512/secure_ai

LLM PROMPT INJECTION

LLM
Application
Workflow
User Prompt
User Response
Tokenization
API Request
Model Processing
Response Generation
API Response

Build a basic
Chat LLM
Application

Integrating External Data into
LLMs

What is prompt
injection?
Prompt injection OWASP defines prompt injection as manipulating “a
large language model (LLM) through crafted inputs,
causing the LLM to execute the attacker’s intentions
unknowingly.”
LLM01: Prompt Injection - OWASP Top 1
0 for LLM & Generative AI Security

Direct Prompt Injection – Basic

Direct Prompt Injection – DoS

Direct Prompt Injection –
Phishing

GitHub - 0xk1h0/ChatGPT_DAN:
ChatGPT DAN, Jailbreaks prompt
Direct Prompt Injection – DAN
Mode

Direct Prompt Injection – DAN
Mode

Other Jailbreaking techniques
– Splitting Payloads
Direct Prompt Injection

Other Jailbreaking techniques – Encoding

Other Jailbreaking techniques – Adding constraints

RCE with prompt
injection
Example – Pandas AI – 1.1.3

QnA
Email: contact@cypeace.net
Phone: (+84) 853 727 900

AI-ttacks - Nghiên cứu về một số tấn công vào các mô hình học máy và AI

More Related Content

What's hot

Similar to AI-ttacks - Nghiên cứu về một số tấn công vào các mô hình học máy và AI

More from Security Bootcamp

Recently uploaded

AI-ttacks - Nghiên cứu về một số tấn công vào các mô hình học máy và AI

Editor's Notes