Being Intentional: Privacy
Engineering and AB Testing
Matt Gershoff, Conductrics
Matt and some of his Bias
Conductrics (CX Optimization Software)
• Inference is hard – it is not a technology or a magic box.
• Solutions live in problems so be explicit in what problem you are solving
• Complexity is a cost - do simplest thing that will solve the problem
• The value of customer analytics is to be a better customer advocate.
• Be Intentional and Have Empathy
INTENTIONAL DESIGN
To take actions with awareness: deliberately, voluntarily, with
conscious purpose.
What I am going to talk about*
* Plus show a bunch of gratuitous pictures of my dog
Privacy Engineering Concepts
• K-Anonymity
• Cardinality
• Equivalence Classes
• Local vs Global Privacy
Experimentation Example
• What is AB Testing?
• Tasks
Primary: AB, MVT, and Bandits
Secondary (Helper) SRM and Test Interference Checks
Analytics on Minimized Data
• Example Calculation for T-Test
• Extensions to Multivariate Cases
Difficulty Level
Legal
Product
Engineering/Data
Privacy Engineering
* Plus show a bunch of gratuitous pictures of my dog
Engineering methodologies, tools, and techniques
to ensure systems provide acceptable levels
of privacy.
Privacy Engineering
* Plus show a bunch of gratuitous pictures of my dog
Privacy by Design Principles
Developed by Dr Ann Cavoukian
https://privacy.ucsc.edu/resources/privacy-by-design---foundational-
principles.pdf
• Privacy must be incorporated into networked
data systems and technologies, by default.
• Privacy must become integral to organizational
priorities, project objectives, design processes,
and planning operations.
• Privacy must be embedded into every standard,
protocol and process that touches our lives.
Privacy by Design: 7 Principles
Developed by Dr Ann Cavoukian
https://privacy.ucsc.edu/resources/privacy-by-design---foundational-
principles.pdf
1. Proactive not reactive (anticipates)
2. Privacy as the default setting.
3. Privacy embedded into design.
4. Full functionality (doesn’t impair)
5. End-to-end security.
6. Visibility and transparency.
7. Respect for user privacy.
Principle 2: Privacy at the Default
Developed by Dr Ann Cavoukian
https://privacy.ucsc.edu/resources/privacy-by-design---foundational-
principles.pdf
Data Minimization −
1. Personally identifiable information should be
kept to a strict minimum.
2. The design of … technologies, … should begin
with non-identifiable interactions and
transactions, as the default.
3. Wherever possible link-ability of personal
information should be minimized.
Just Enough: Data Minimization
JUST Enough
DATA for THIS Question/Task
By Default Technology Should:
● Minimize Individual Information
● Minimize Granularity
● Minimize Linkability of Personal Info
Explicit Objective: Solve THIS Task (eg AB Test)
See: Privacy by Design, Dr Ann Cavoukian
GDPR Article 5c and Article 25
JUST ENOUGH vs JUST IN CASE
JUST Enough
DATA for THIS Question/Task
By Default Technology Should:
● Minimize Individual Information
● Minimize Granularity
● Minimize Linkability of Personal Info
Explicit Objective: Solve THIS Task (eg AB Test)
JUST IN CASE
DATA for all Possible FUTURE Questions
By Default Technology Should:
● Maximize Individual Information
● Maximize Granularity
● Maximize Linkability of Personal Info
Shadow Objective: Maximize Optionality
Privacy Engineering Methods
1. Pseudonymize / de-identify
2. Don’t Link All the Data
(No Big Table of all the data - have separate unlinked tables)
3. Enumeration - impose limited values/bins for less granularity
4. Aggregation
1.K-Anonymization - Quasi-Identifiers
2.L-Diversity (won’t cover)
5. Differential Privacy - Make it noisy using Laplace or Gaussian Noise
(won’t cover)
Example: Conductrics and Experimentation with Equivalence Classes
What is AB Testing
and why bother me
about it?
Why AB Testing?
• New Drug have greater efficacy over a placebo or current treatment?
• Does a new marketing campaign increase memberships?
• Does a new layout on a travel search results page improve travel bookings?
Helps discover Causal relationships
Examples:
What is AB Testing?
1. Assigns Experiences Randomly (Block Confounding)
A procedure that:
What is AB Testing?
1. Assigns Experiences Randomly (Block Confounding)
A procedure that:
Age
Hospitalization
Vaccine
RANDOM
SELECTION
What is AB Testing?
1. Assigns Experiences Randomly (Block Confounding)
2. Applies methods of Statistical Inference to draw conclusions *
A procedure that:
*Much arguing over WHAT methods – often it is second order stuff IMO.
What is AB Testing?
1. Assigns Experiences Randomly (Block Confounding)
2. Applies methods of Statistical Inference to draw conclusions *
A procedure that:
*Much arguing over WHAT methods – often it is second order stuff IMO.
Univariate
What is AB Testing?
1. Assigns Experiences Randomly (Block Confounding)
2. Applies methods of Statistical Inference to draw conclusions *
A procedure that:
*Much arguing over WHAT methods – often it is second order stuff IMO.
Univariate Multivariate
What is AB Testing?
1. Assigns Experiences Randomly (Block Confounding)
2. Applies methods of Statistical Inference to draw conclusions *
A procedure that:
*Much arguing over WHAT methods – often it is second order stuff IMO.
Univariate Multivariate
What is AB Testing?
AB Testing for Causal Inference
*Much arguing over WHAT methods – often it is second order stuff IMO.
Primary Tasks: AB Tests | MVT | Contextual Bandits
Univariate Multivariate
AB Tests Factorial Designs / MVTs Contextual Bandits*
t-tests ANOVA/f-tests Regression/Prediction
Required Statistics
Supporting Tasks : SRM & AB Test Interference Checks
Univariate Multivariate
Sample Ratio Mismatch (SRM) AB Test Interactions
Chi-Square Test Nested Partial f-tests
Required Statistics
Example: Conductrics and Experimentation with Equivalence Classes
… but what are
Equivalence Classes?
ABTest3
ABTest2
ABTest1
Tenure
Status
Sales
Other
Stuff
Email
Phone
Id
c
B
A
0
None
$0
…
xyz@email.com
555555555
C1001
-
A
A
1
Silver
$4
…
abc@email.com
-
C1002
b
A
B
3
Plat
$16
…
def@email.com
-
C1003
-
B
A
2
Plat
$15
…
ghi@email.com
555555555
C1004
a
B
B
6
Silver
$3
…
jkl@email.com
-
C1005
-
A
A
4
Gold
$5
…
mno@email.com
555555555
C1006
…
Standard BIG TABLE (Collect AMAP and Link AMAP)
Experimentation data is stored with and id and appended
to other customer data.
ABTest3
ABTest2
ABTest1
Tenure
Status
Sales
Other
Stuff
Email
Phone
Id
c
B
A
0
None
0
…
xyz@email.com
555555555
C1001
-
A
A
1
Silver
4
…
abc@email.com
-
C1002
b
A
B
3
Plat
16
…
def@email.com
-
C1003
-
B
A
2
Plat
15
…
ghi@email.com
555555555
C1004
a
B
B
6
Silver
3
…
jkl@email.com
-
C1005
-
A
A
4
Gold
5
…
mno@email.com
555555555
C1006
…
But for AB Tests we only need Sales and test assignments - none of the other data
Standard Collection / Storage
Just Enough- Task Level Data Storage
1. Each AB Test has its own separate data structure
2. Collect aggregate counts, conversion, and conversion^2 data by treatments
Aggregate data into Equivalence Classes (think Pivot Tables)
Equivalence Class for a simple AB Test
Global vs Local Privacy
• Global – Central Aggregator/Curator collects the detail/raw data.
Then applies these methods AFTER collection to any data they
share.
• Local – The data min/privacy methods are applied BEFORE they
are collected and stored.
Global vs Local Privacy
Global Approach
Global vs Local Privacy
Global Approach
Secure Curator - Stores Nonprivate Data
Global vs Local Privacy
Global Approach
Secure Curator - Stores Nonprivate Data
Release Minimized/Anonymized data
- Use for Internal Data or Product Teams
Global vs Local Privacy
Local Approach
No Secure Curator
Only Collect and Store Minimized/Anonymized Data
Just Enough- Local Collection and Task Level Data Storage
How can I collect
data only in
summary form?
Implementation Example
Data Minimization with Equivalence Classes
K-Anonymity
Efficient Data Storage
Efficient Computation
Encourages Intentional Thinking and Design
Benefits
Cons
Limits Methods
No Formal Privacy Guarentes
Loss of Optionality
Data Minimization with Equivalence Classes
What is K-
anonymity?
K-Anonymity
Hide in crowd of K other ‘equivalent’ people.
K-Anonymity
Easy to monitor and report on K
Search for Min(Count) in each table
E.g. Here K=1925
Efficient Data Storage
Known Max Size for Each Data Set
Size is bounded by Joint Cardinality regardless of number of individuals
E.g. Here the Max Rows = 4 even though N = 8,000
Why did you sum
the sales and the
sum of the
squared values of
sales?
Efficient Computations: Why Counts, Sums, and Sums of Squares?
Efficient Computations: Why Counts, Sums, and Sums of Squares?
Simple AB Tests only need:
1. Counts
2. Sum of Conversion values
3. Sum of Conversion^2
AB Test Data
Standard T-Test formula
Efficient Computations: Why Counts, Sums, and Sums of Squares?
Simple AB Tests only need:
1. Counts
2. Sum of Conversion values
3. Sum of Conversion^2
AB Test Data
T-Test formula rewritten in terms of just these aggregate values
Efficient Computations: Why Counts, Sums, and Sums of Squares?
MVT/ Factorial ANOVA Problems only need:
1. Conditional Counts
2. Conditional Sum of Conversion values
3. Conditional Sum of Conversion^2
−1
𝟏
𝑵
𝟏
𝟐
𝑵
𝟏
𝒌
𝑵
𝟏
Equation for OLS Regression
OLS Regression Too!
* is not shown for brevity
Efficient Computations: Why Counts, Sums, and Sums of Squares?
User Level Data (413 rows) Regression Output from User Level (413 rows)
Efficient Computations: Why Counts, Sums, and Sums of Squares?
Equivalence Class Level Data (5 rows – covariate data 4 values + missing flag)
Regression Output from Equivalence Class Level (5 rows)
Efficient Computations: Why Counts, Sums, and Sums of Squares?
Equivalence Class Level Analysis (5 rows)
User Level Analysis (413 rows)
The main value of experimentation/AB Testing programs is that they provide
a principled framework for organizations to act and learn intentionally.
Value of AB Testing
• Well Defined Problems
• Explicit Objectives
• Make Decisions at the Margin
Data Minimization provides a principled framework for organizations to
think about and collect data intentionally.
Value of Data Minimization
• Defined Problems
• Explicit Objectives for the collection of Data
• Consider the marginal value of next additional bit wrt solving the problem
Thank you!
Questions?
Equivalence Class for a Multivariate Test
Store at the unique combination of desired data elements
Multivariate Equivalence Classes
Just Enough- Task Level Data Storage
What is AB Testing?
1. Assigns Experiences Randomly (Block Confounding)
A procedure that:
*Much arguing over WHAT methods – often it is second order stuff IMO.
Confounder
Outcome
Treatment
RANDOM
SELECTION
Learning from Observations via AB Testing
Right to Left = Turkey! Left to Right= Turkey!
Inference: Franklin Prefers Turkey to Pupperoni!

Being Intentional: Privacy Engineering and A/B Testing

  • 1.
    Being Intentional: Privacy Engineeringand AB Testing Matt Gershoff, Conductrics
  • 2.
    Matt and someof his Bias Conductrics (CX Optimization Software) • Inference is hard – it is not a technology or a magic box. • Solutions live in problems so be explicit in what problem you are solving • Complexity is a cost - do simplest thing that will solve the problem • The value of customer analytics is to be a better customer advocate. • Be Intentional and Have Empathy
  • 3.
    INTENTIONAL DESIGN To takeactions with awareness: deliberately, voluntarily, with conscious purpose.
  • 4.
    What I amgoing to talk about* * Plus show a bunch of gratuitous pictures of my dog Privacy Engineering Concepts • K-Anonymity • Cardinality • Equivalence Classes • Local vs Global Privacy Experimentation Example • What is AB Testing? • Tasks Primary: AB, MVT, and Bandits Secondary (Helper) SRM and Test Interference Checks Analytics on Minimized Data • Example Calculation for T-Test • Extensions to Multivariate Cases Difficulty Level
  • 5.
    Legal Product Engineering/Data Privacy Engineering * Plusshow a bunch of gratuitous pictures of my dog Engineering methodologies, tools, and techniques to ensure systems provide acceptable levels of privacy.
  • 6.
    Privacy Engineering * Plusshow a bunch of gratuitous pictures of my dog
  • 7.
    Privacy by DesignPrinciples Developed by Dr Ann Cavoukian https://privacy.ucsc.edu/resources/privacy-by-design---foundational- principles.pdf • Privacy must be incorporated into networked data systems and technologies, by default. • Privacy must become integral to organizational priorities, project objectives, design processes, and planning operations. • Privacy must be embedded into every standard, protocol and process that touches our lives.
  • 8.
    Privacy by Design:7 Principles Developed by Dr Ann Cavoukian https://privacy.ucsc.edu/resources/privacy-by-design---foundational- principles.pdf 1. Proactive not reactive (anticipates) 2. Privacy as the default setting. 3. Privacy embedded into design. 4. Full functionality (doesn’t impair) 5. End-to-end security. 6. Visibility and transparency. 7. Respect for user privacy.
  • 9.
    Principle 2: Privacyat the Default Developed by Dr Ann Cavoukian https://privacy.ucsc.edu/resources/privacy-by-design---foundational- principles.pdf Data Minimization − 1. Personally identifiable information should be kept to a strict minimum. 2. The design of … technologies, … should begin with non-identifiable interactions and transactions, as the default. 3. Wherever possible link-ability of personal information should be minimized.
  • 10.
    Just Enough: DataMinimization JUST Enough DATA for THIS Question/Task By Default Technology Should: ● Minimize Individual Information ● Minimize Granularity ● Minimize Linkability of Personal Info Explicit Objective: Solve THIS Task (eg AB Test) See: Privacy by Design, Dr Ann Cavoukian GDPR Article 5c and Article 25
  • 11.
    JUST ENOUGH vsJUST IN CASE JUST Enough DATA for THIS Question/Task By Default Technology Should: ● Minimize Individual Information ● Minimize Granularity ● Minimize Linkability of Personal Info Explicit Objective: Solve THIS Task (eg AB Test) JUST IN CASE DATA for all Possible FUTURE Questions By Default Technology Should: ● Maximize Individual Information ● Maximize Granularity ● Maximize Linkability of Personal Info Shadow Objective: Maximize Optionality
  • 12.
    Privacy Engineering Methods 1.Pseudonymize / de-identify 2. Don’t Link All the Data (No Big Table of all the data - have separate unlinked tables) 3. Enumeration - impose limited values/bins for less granularity 4. Aggregation 1.K-Anonymization - Quasi-Identifiers 2.L-Diversity (won’t cover) 5. Differential Privacy - Make it noisy using Laplace or Gaussian Noise (won’t cover)
  • 13.
    Example: Conductrics andExperimentation with Equivalence Classes What is AB Testing and why bother me about it?
  • 14.
    Why AB Testing? •New Drug have greater efficacy over a placebo or current treatment? • Does a new marketing campaign increase memberships? • Does a new layout on a travel search results page improve travel bookings? Helps discover Causal relationships Examples:
  • 15.
    What is ABTesting? 1. Assigns Experiences Randomly (Block Confounding) A procedure that:
  • 16.
    What is ABTesting? 1. Assigns Experiences Randomly (Block Confounding) A procedure that: Age Hospitalization Vaccine RANDOM SELECTION
  • 17.
    What is ABTesting? 1. Assigns Experiences Randomly (Block Confounding) 2. Applies methods of Statistical Inference to draw conclusions * A procedure that: *Much arguing over WHAT methods – often it is second order stuff IMO.
  • 18.
    What is ABTesting? 1. Assigns Experiences Randomly (Block Confounding) 2. Applies methods of Statistical Inference to draw conclusions * A procedure that: *Much arguing over WHAT methods – often it is second order stuff IMO. Univariate
  • 19.
    What is ABTesting? 1. Assigns Experiences Randomly (Block Confounding) 2. Applies methods of Statistical Inference to draw conclusions * A procedure that: *Much arguing over WHAT methods – often it is second order stuff IMO. Univariate Multivariate
  • 20.
    What is ABTesting? 1. Assigns Experiences Randomly (Block Confounding) 2. Applies methods of Statistical Inference to draw conclusions * A procedure that: *Much arguing over WHAT methods – often it is second order stuff IMO. Univariate Multivariate
  • 21.
    What is ABTesting? AB Testing for Causal Inference *Much arguing over WHAT methods – often it is second order stuff IMO.
  • 22.
    Primary Tasks: ABTests | MVT | Contextual Bandits Univariate Multivariate AB Tests Factorial Designs / MVTs Contextual Bandits* t-tests ANOVA/f-tests Regression/Prediction Required Statistics
  • 23.
    Supporting Tasks :SRM & AB Test Interference Checks Univariate Multivariate Sample Ratio Mismatch (SRM) AB Test Interactions Chi-Square Test Nested Partial f-tests Required Statistics
  • 24.
    Example: Conductrics andExperimentation with Equivalence Classes … but what are Equivalence Classes?
  • 25.
  • 26.
  • 27.
    Just Enough- TaskLevel Data Storage 1. Each AB Test has its own separate data structure 2. Collect aggregate counts, conversion, and conversion^2 data by treatments Aggregate data into Equivalence Classes (think Pivot Tables) Equivalence Class for a simple AB Test
  • 28.
    Global vs LocalPrivacy • Global – Central Aggregator/Curator collects the detail/raw data. Then applies these methods AFTER collection to any data they share. • Local – The data min/privacy methods are applied BEFORE they are collected and stored.
  • 29.
    Global vs LocalPrivacy Global Approach
  • 30.
    Global vs LocalPrivacy Global Approach Secure Curator - Stores Nonprivate Data
  • 31.
    Global vs LocalPrivacy Global Approach Secure Curator - Stores Nonprivate Data Release Minimized/Anonymized data - Use for Internal Data or Product Teams
  • 32.
    Global vs LocalPrivacy Local Approach No Secure Curator Only Collect and Store Minimized/Anonymized Data
  • 33.
    Just Enough- LocalCollection and Task Level Data Storage How can I collect data only in summary form?
  • 34.
  • 35.
    Data Minimization withEquivalence Classes K-Anonymity Efficient Data Storage Efficient Computation Encourages Intentional Thinking and Design Benefits Cons Limits Methods No Formal Privacy Guarentes Loss of Optionality
  • 36.
    Data Minimization withEquivalence Classes What is K- anonymity?
  • 37.
    K-Anonymity Hide in crowdof K other ‘equivalent’ people.
  • 38.
    K-Anonymity Easy to monitorand report on K Search for Min(Count) in each table E.g. Here K=1925
  • 39.
    Efficient Data Storage KnownMax Size for Each Data Set Size is bounded by Joint Cardinality regardless of number of individuals E.g. Here the Max Rows = 4 even though N = 8,000
  • 40.
    Why did yousum the sales and the sum of the squared values of sales? Efficient Computations: Why Counts, Sums, and Sums of Squares?
  • 41.
    Efficient Computations: WhyCounts, Sums, and Sums of Squares? Simple AB Tests only need: 1. Counts 2. Sum of Conversion values 3. Sum of Conversion^2 AB Test Data Standard T-Test formula
  • 42.
    Efficient Computations: WhyCounts, Sums, and Sums of Squares? Simple AB Tests only need: 1. Counts 2. Sum of Conversion values 3. Sum of Conversion^2 AB Test Data T-Test formula rewritten in terms of just these aggregate values
  • 43.
    Efficient Computations: WhyCounts, Sums, and Sums of Squares? MVT/ Factorial ANOVA Problems only need: 1. Conditional Counts 2. Conditional Sum of Conversion values 3. Conditional Sum of Conversion^2 −1 𝟏 𝑵 𝟏 𝟐 𝑵 𝟏 𝒌 𝑵 𝟏 Equation for OLS Regression OLS Regression Too! * is not shown for brevity
  • 44.
    Efficient Computations: WhyCounts, Sums, and Sums of Squares? User Level Data (413 rows) Regression Output from User Level (413 rows)
  • 45.
    Efficient Computations: WhyCounts, Sums, and Sums of Squares? Equivalence Class Level Data (5 rows – covariate data 4 values + missing flag) Regression Output from Equivalence Class Level (5 rows)
  • 46.
    Efficient Computations: WhyCounts, Sums, and Sums of Squares? Equivalence Class Level Analysis (5 rows) User Level Analysis (413 rows)
  • 47.
    The main valueof experimentation/AB Testing programs is that they provide a principled framework for organizations to act and learn intentionally. Value of AB Testing • Well Defined Problems • Explicit Objectives • Make Decisions at the Margin
  • 48.
    Data Minimization providesa principled framework for organizations to think about and collect data intentionally. Value of Data Minimization • Defined Problems • Explicit Objectives for the collection of Data • Consider the marginal value of next additional bit wrt solving the problem
  • 49.
  • 50.
  • 51.
    Equivalence Class fora Multivariate Test Store at the unique combination of desired data elements Multivariate Equivalence Classes Just Enough- Task Level Data Storage
  • 52.
    What is ABTesting? 1. Assigns Experiences Randomly (Block Confounding) A procedure that: *Much arguing over WHAT methods – often it is second order stuff IMO. Confounder Outcome Treatment RANDOM SELECTION
  • 53.
    Learning from Observationsvia AB Testing Right to Left = Turkey! Left to Right= Turkey! Inference: Franklin Prefers Turkey to Pupperoni!