Codes on https://github.com/anmold-07/Model-Extraction-with-RL
https://www.usenix.org/conference/usenixsecurity20/presentation/chandrasekaran
This paper formalizes model extraction and discusses possible defense strategies by drawing parallels between model extraction and an established area of active learning. In particular, the authors show that recent advancements in the active learning domain can be used to implement powerful model extraction attacks and investigate possible defense strategies.
9654467111 Call Girls In Munirka Hotel And Home Service
Connections b/w active learning and model extraction
1. Exploring Connections between
Active Learning and Model Extraction
Anmol Dwivedi
with credits to the original presentation by the authors at the 2020 USENIX conference*
2. Introduction
• Paper: Exploring Connections between Active Learning and Model Extraction
• Conference: 29th USENIX Security Symposium
• Dates: August 12th-14th, 2020
• Authors: Varun Chandrasekaran, Kamalika Chaudhuri, Irene Giacomelli, Somesh Jha, Songbai Yan
3. Overview
1. Model Extraction from MLaaS
• Motivation
• Definition
4. Defense Strategies
• Data Dependent Defense Strategies
• Data Independent Defense Strategies
2. Machine Learning
• Passive Learning
• Active Learning
3. Evaluate Performance
• Linear Models
• Non-linear Models
5. Summary & Open Questions
4. Machine Learning as a Service (MLaaS)
User
Data
Local Server
MLaaS
(Oracle Access)
Query
Answer
Query Interface
Advantages:
• Scalability
• Availability
• Monetizability of the model
(pay per query regime)
5. Model Extraction
Adversary Goals:
• White-box Inversion, Membership
Inference and Model Inversion attacks
• Undermine the pay per query regime
Query
Objectives:
1. Learn an approximation of the model
2. Use as few queries as possible
6. Example: Equation Solving (ES) Attack for Linear Regression
• Strategy for adversary:
• Solve a system of linear equations:
• Experiment outcome:
Use Machine Learning (ML) to solve for strategies for more difficult Hypothesis
classes!
8. Passive Learning Setting
• Learner (adversary) has access to a large labeled dataset D in its entirety
• Typically, Probably Approximately Correct (PAC) framework is used to learn f where for algorithm A
Algorithm outputs a function within risk
tolerance with confidence by
using i.i.d data-points
Empirical Risk Minimization (ERM)
• Problem: Well known inequalities such as Hoeffding’s bound tell us that as
the sample complexity grows rapidly!
9. Active Learning Setting
• Learner (adversary) has access to a smaller set of labeled instances (lower sample complexity regime)
• Learner can actively choose that benefits their query strategy
• By intelligently choosing the learner can drastically reduce sample complexity!
Learner
Oracle
Lower Query Complexity
Trade-off: Error VS Query Complexity
than passive learning
Model extraction is similar to Active Learning
11. Active Learning
PAC Active Learning Query Synthesis (QS) Active Learning
PAC Scenario
• Assumes access to the data distribution on (X, Y).
• The learner then decides whether to query a given
data-point x once given a data set.
QS Scenario
• Assumes no access to data distribution on (X, Y).
• Rather, the instances are generated by learner
(even if it might have zero probability of
generation).
• Query Synthesis (QS) active learning is more suitable for model extraction due to lesser
prior knowledge requirement about the data distribution.
• Hence, any active learning algorithm in the QS scenario can be used for model extraction!
Advancement in Active Learning Threat to MLaaS systems
13. Evaluation: Non-Linear Models (kernel SVMs & Decision Trees)
Kernel SVMs (RBF kernel)
extraction via the Adaptive
Retraining and the EAT (proposed)
active learning algorithm
Decision Tree extraction via the Path
Finding and IWAL (proposed) active
learning algorithm
Prior work Prior work
Proposed Proposed
14. Defense Strategies
Link between ML in noisy setting and model extraction
Server implements randomized
defense strategy D
Client
Client gets noisy answers from server instead of
• Result in more queries than usual
• Less accurate model
Consequences:
16. Data-Independent Randomization
if
Defense D is not secure
else
Server is useless since it outputs incorrect labels most of the time
A bound on the number of samples required is
17. Evaluation of Defense Strategies: Data-Independent noise
d=64 d=13
Model extraction is possible
despite the data independent
noise strategy D
19. Evaluation of Defense Strategies: Data-Dependent noise
Model extraction is NOT possible
and is secure against this
particular Data-Dependent
defense active learning strategy!
20. No “free lunch” for defense
Model extraction is inevitable
• Data independent defense mechanisms fail
• Data dependent defense mechanisms fail against passive learning approaches
21.
22. Summary
• Connection between Active Learning and Model Extraction
• Provide attacks under more realistic scenarios
• No free lunch, i.e., model extraction is inevitable
23. Open Questions
Query Synthesis
Active Learning
(QSAL) algorithms
for DNNs
Determining the
model type hosted on
the server through
“hard label” query
interactions
Re-use the labeled
data to learn
another different,
hypothesis space
Data dependent
defense mechanisms
for real-valued target
functions f