SlideShare a Scribd company logo
Building Machine Learning Models
with Strict Privacy Boundaries
Renaud Bourassa
rbourassa@slack-corp.com
March 29, 2019
Agenda
1. Data at Slack and how it applies
to Machine Learning.
2. Building a privacy preserving
search ranking model.
What is Slack?
What is Slack?
At its core, Slack is a communication platform.
Data at Slack
● Two interesting characteristics that differentiate Slack from other
communication platforms:
1. Within an organization, data is public by default.
2. Across organizations, data is strictly private by default.
● In many traditional communication platforms, including email,
data within an organization is private by default.
“Hello!”
Public by Default
Sender
Recipient
Sender
Recipient
Public by Default
● Data in Slack is (mostly) public by default and available to all users
within the organization.
#channel“Hello!”
(Mostly Public)
Sender
Public by Default
● Data in Slack is (mostly) public by default and available to all users
within the organization.
#channel“Hello!”
Recipient
(Mostly Public)
Public by Default
● What does this mean in the context of Machine Learning?
Lots of public data at the organization level.
○ Gives us a huge source of data to build Machine Learning
models.
○ Makes Machine Learning a valuable tool to help users sift
through the data.
Data at Slack
● Two interesting characteristics that differentiate Slack from other
communication platforms:
1. Within an organization, data is public by default.
2. Across organizations, data is strictly private by default.
Strict Privacy Boundaries
● Data in Slack should not leak across organizations.
#pizza
#burgers
Organization A Organization B
#cats
#dogs
Strict Privacy Boundaries
● Models in Slack should not leak data across organizations.
#cats
#dogs
#pizza
#burgers
Training Topic Model
“Layoffs”
“Company B”“Company B is
planning layoffs”
Bad!
Strict Privacy Boundaries
● What does this mean in the context of Machine Learning?
Models should respect the privacy boundaries between
organizations.
○ Models should not leak data explicitly.
○ Models should not leak data implicitly.
Search
Problem
Given a query, return the most
relevant documents (e.g.
messages, files).
Learn To Rank
q
D={d1
,d2
,…,dn
}Solr
Model
f(q,d)
f(q,di
)
f(q,dj
)
f(q,dk
)
…
di
dj
dk
…
● Sort documents by scores in a way that maximizes utility.
Learn to Rank
● How do we train this model?
DW
Query
Logs
Click
Logs
(q1
,{d1,1
,d1,2
,…,d1,n
})
(q2
,{d2,1
,d2,2
,…,d2,m
})
…
Model
Training
Model
f(q,d)
Learn to Rank
● How do we train this model in a privacy-preserving way?
DW
Query
Logs
Click
Logs
…
#cats
#dogs
#pizza
#burgers
Individual Models
● Why not build one model per organization?
○ Sparsity
High dimensional inputs with low coverage within a single
organization.
○ Complexity
Over 500,000 organizations ranging from a few users to
Fortune 500 companies.
Global Model
● How can we train a global privacy-preserving model?
○ Attribute Parameterization
Feature transformation technique that factors out private
information and reduces sparsity.
Learning from User Interactions in Personal Search via Attribute
Parameterization (Bendersky et al. 2017)
Attribute Parameterization
“MLConf”
Query Document
Attributes
user_id:U123
terms:[“MLConf”]
user_id:U456
channel_id:C789
terms:[“Hey”,…]
One Hot Encoding
Attribute Parameterization
user_id:U123
terms:[“MLConf”]
user_id:U456
channel_id:C789
terms:[“Hey”,…]
Model f(q,d)
Attribute Parameterization
user_id:U123
terms:[“MLConf”]
user_id:U456
channel_id:C789
terms:[“Hey”,…]
Model f(q,d)
g(dterms
)
Parameterization
Examples:
● num_terms(dterms
)
● num_emojis(dterms
)
Attribute Parameterization
user_id:U123
terms:[“MLConf”]
user_id:U456
channel_id:C789
terms:[“Hey”,…]
Model f(q,d)
ctr(dchannel_id
)
Parameterization
Definition:
ctr(dx
) = clicks(dx
) / impressions(dx
)
Definition:
ctr(qx
,dy
) = clicks(qx
AND dy
) / impressions(qx
AND dy
)
Attribute Parameterization
user_id:U123
terms:[“MLConf”]
user_id:U456
channel_id:C789
terms:[“Hey”,…]
Model f(q,d)ctr(quser_id
,dchannel_id
)
Parameterization
Examples:
● ctr(quser_id
,duser_id
)
● ctr(quser_id
,dreactor_id
)
● ctr(qteam_id
,dterm
)
Attribute Parameterization
user_id:U123
terms:[“MLConf”]
user_id:U456
channel_id:C789
terms:[“Hey”,…]
Model f(q,d)ctr(qterms
,dterms
)
Could leak private
data between
organizations!
Attribute Parameterization
user_id:U123
terms:[“MLConf”]
user_id:U456
channel_id:C789
terms:[“Hey”,…]
Model f(q,d)
Safe!
ctr(quser_id
,qterms
,dterms
)
Attribute Parameterization
q
Solr
Model
f(q,d)
di
dj
dk
…
● Precompute and index CTR features in feature store.
D
Attribute
Parameterization
Feature
Store
DW
Query
Logs
Click
Logs
Learn to Rank
● How do we train this model in a privacy-preserving way?
By learning from carefully crafted functions of the high
dimensional attributes of the query and documents, we are able to
factor out the private data and reduce the sparsity of our training
set before it reaches the model.
Thank You!
We’re hiring!
https://goo.gl/FqzD6U

More Related Content

Similar to Renaud bourassa building machine learning models with strict privacy boundaries

Design thinking and agile development
Design thinking and agile developmentDesign thinking and agile development
Design thinking and agile development
InteractiveCologne
 
The Freedom to Grow: How Standards in Communication Facilitate Our Industry, ...
The Freedom to Grow: How Standards in Communication Facilitate Our Industry, ...The Freedom to Grow: How Standards in Communication Facilitate Our Industry, ...
The Freedom to Grow: How Standards in Communication Facilitate Our Industry, ...
dclsocialmedia
 

Similar to Renaud bourassa building machine learning models with strict privacy boundaries (20)

Birasa 1
Birasa 1Birasa 1
Birasa 1
 
JAVA PROGRAMMING
JAVA PROGRAMMING JAVA PROGRAMMING
JAVA PROGRAMMING
 
Cloudera Breakfast: Advanced Analytics Part II: Do More With Your Data
Cloudera Breakfast: Advanced Analytics Part II: Do More With Your DataCloudera Breakfast: Advanced Analytics Part II: Do More With Your Data
Cloudera Breakfast: Advanced Analytics Part II: Do More With Your Data
 
Fairness, Transparency, and Privacy in AI @LinkedIn
Fairness, Transparency, and Privacy in AI @LinkedInFairness, Transparency, and Privacy in AI @LinkedIn
Fairness, Transparency, and Privacy in AI @LinkedIn
 
JAVA PROGRAMMINGD
JAVA PROGRAMMINGDJAVA PROGRAMMINGD
JAVA PROGRAMMINGD
 
Meme Index. Analyzing fads and sensations on the Internet by Miguel Romero at...
Meme Index. Analyzing fads and sensations on the Internet by Miguel Romero at...Meme Index. Analyzing fads and sensations on the Internet by Miguel Romero at...
Meme Index. Analyzing fads and sensations on the Internet by Miguel Romero at...
 
Introduction to Machine Learning - WeCloudData
Introduction to Machine Learning - WeCloudDataIntroduction to Machine Learning - WeCloudData
Introduction to Machine Learning - WeCloudData
 
Introduction to Machine Learning - WeCloudData
Introduction to Machine Learning - WeCloudDataIntroduction to Machine Learning - WeCloudData
Introduction to Machine Learning - WeCloudData
 
Design thinking and agile development
Design thinking and agile developmentDesign thinking and agile development
Design thinking and agile development
 
Design thinking and agile development
Design thinking and agile developmentDesign thinking and agile development
Design thinking and agile development
 
Data science presentation
Data science presentationData science presentation
Data science presentation
 
Adopting Domain-Driven Design in your organization
Adopting Domain-Driven Design in your organizationAdopting Domain-Driven Design in your organization
Adopting Domain-Driven Design in your organization
 
Data Science Popup Austin: Conflict in Growing Data Science Organizations
Data Science Popup Austin: Conflict in Growing Data Science Organizations Data Science Popup Austin: Conflict in Growing Data Science Organizations
Data Science Popup Austin: Conflict in Growing Data Science Organizations
 
SIKM - KM & Employee Experience at Blend
SIKM - KM & Employee Experience at BlendSIKM - KM & Employee Experience at Blend
SIKM - KM & Employee Experience at Blend
 
Lunch and Learn Artificial intelligence
Lunch and Learn Artificial intelligence Lunch and Learn Artificial intelligence
Lunch and Learn Artificial intelligence
 
Building successful data science teams
Building successful data science teamsBuilding successful data science teams
Building successful data science teams
 
The Freedom to Grow: How Standards in Communication Facilitate Our Industry, ...
The Freedom to Grow: How Standards in Communication Facilitate Our Industry, ...The Freedom to Grow: How Standards in Communication Facilitate Our Industry, ...
The Freedom to Grow: How Standards in Communication Facilitate Our Industry, ...
 
How a Social Knowledge Graph Improves Remote Working by Capturing Context fro...
How a Social Knowledge Graph Improves Remote Working by Capturing Context fro...How a Social Knowledge Graph Improves Remote Working by Capturing Context fro...
How a Social Knowledge Graph Improves Remote Working by Capturing Context fro...
 
Session 01 designing and scoping a data science project
Session 01 designing and scoping a data science projectSession 01 designing and scoping a data science project
Session 01 designing and scoping a data science project
 
Session 01 designing and scoping a data science project
Session 01 designing and scoping a data science projectSession 01 designing and scoping a data science project
Session 01 designing and scoping a data science project
 

More from MLconf

Ted Willke - The Brain’s Guide to Dealing with Context in Language Understanding
Ted Willke - The Brain’s Guide to Dealing with Context in Language UnderstandingTed Willke - The Brain’s Guide to Dealing with Context in Language Understanding
Ted Willke - The Brain’s Guide to Dealing with Context in Language Understanding
MLconf
 
Justin Armstrong - Applying Computer Vision to Reduce Contamination in the Re...
Justin Armstrong - Applying Computer Vision to Reduce Contamination in the Re...Justin Armstrong - Applying Computer Vision to Reduce Contamination in the Re...
Justin Armstrong - Applying Computer Vision to Reduce Contamination in the Re...
MLconf
 
Jekaterina Novikova - Machine Learning Methods in Detecting Alzheimer’s Disea...
Jekaterina Novikova - Machine Learning Methods in Detecting Alzheimer’s Disea...Jekaterina Novikova - Machine Learning Methods in Detecting Alzheimer’s Disea...
Jekaterina Novikova - Machine Learning Methods in Detecting Alzheimer’s Disea...
MLconf
 
Anoop Deoras - Building an Incrementally Trained, Local Taste Aware, Global D...
Anoop Deoras - Building an Incrementally Trained, Local Taste Aware, Global D...Anoop Deoras - Building an Incrementally Trained, Local Taste Aware, Global D...
Anoop Deoras - Building an Incrementally Trained, Local Taste Aware, Global D...
MLconf
 
Vito Ostuni - The Voice: New Challenges in a Zero UI World
Vito Ostuni - The Voice: New Challenges in a Zero UI WorldVito Ostuni - The Voice: New Challenges in a Zero UI World
Vito Ostuni - The Voice: New Challenges in a Zero UI World
MLconf
 

More from MLconf (20)

Jamila Smith-Loud - Understanding Human Impact: Social and Equity Assessments...
Jamila Smith-Loud - Understanding Human Impact: Social and Equity Assessments...Jamila Smith-Loud - Understanding Human Impact: Social and Equity Assessments...
Jamila Smith-Loud - Understanding Human Impact: Social and Equity Assessments...
 
Ted Willke - The Brain’s Guide to Dealing with Context in Language Understanding
Ted Willke - The Brain’s Guide to Dealing with Context in Language UnderstandingTed Willke - The Brain’s Guide to Dealing with Context in Language Understanding
Ted Willke - The Brain’s Guide to Dealing with Context in Language Understanding
 
Justin Armstrong - Applying Computer Vision to Reduce Contamination in the Re...
Justin Armstrong - Applying Computer Vision to Reduce Contamination in the Re...Justin Armstrong - Applying Computer Vision to Reduce Contamination in the Re...
Justin Armstrong - Applying Computer Vision to Reduce Contamination in the Re...
 
Igor Markov - Quantum Computing: a Treasure Hunt, not a Gold Rush
Igor Markov - Quantum Computing: a Treasure Hunt, not a Gold RushIgor Markov - Quantum Computing: a Treasure Hunt, not a Gold Rush
Igor Markov - Quantum Computing: a Treasure Hunt, not a Gold Rush
 
Josh Wills - Data Labeling as Religious Experience
Josh Wills - Data Labeling as Religious ExperienceJosh Wills - Data Labeling as Religious Experience
Josh Wills - Data Labeling as Religious Experience
 
Vinay Prabhu - Project GaitNet: Ushering in the ImageNet moment for human Gai...
Vinay Prabhu - Project GaitNet: Ushering in the ImageNet moment for human Gai...Vinay Prabhu - Project GaitNet: Ushering in the ImageNet moment for human Gai...
Vinay Prabhu - Project GaitNet: Ushering in the ImageNet moment for human Gai...
 
Jekaterina Novikova - Machine Learning Methods in Detecting Alzheimer’s Disea...
Jekaterina Novikova - Machine Learning Methods in Detecting Alzheimer’s Disea...Jekaterina Novikova - Machine Learning Methods in Detecting Alzheimer’s Disea...
Jekaterina Novikova - Machine Learning Methods in Detecting Alzheimer’s Disea...
 
Meghana Ravikumar - Optimized Image Classification on the Cheap
Meghana Ravikumar - Optimized Image Classification on the CheapMeghana Ravikumar - Optimized Image Classification on the Cheap
Meghana Ravikumar - Optimized Image Classification on the Cheap
 
Noam Finkelstein - The Importance of Modeling Data Collection
Noam Finkelstein - The Importance of Modeling Data CollectionNoam Finkelstein - The Importance of Modeling Data Collection
Noam Finkelstein - The Importance of Modeling Data Collection
 
June Andrews - The Uncanny Valley of ML
June Andrews - The Uncanny Valley of MLJune Andrews - The Uncanny Valley of ML
June Andrews - The Uncanny Valley of ML
 
Sneha Rajana - Deep Learning Architectures for Semantic Relation Detection Tasks
Sneha Rajana - Deep Learning Architectures for Semantic Relation Detection TasksSneha Rajana - Deep Learning Architectures for Semantic Relation Detection Tasks
Sneha Rajana - Deep Learning Architectures for Semantic Relation Detection Tasks
 
Anoop Deoras - Building an Incrementally Trained, Local Taste Aware, Global D...
Anoop Deoras - Building an Incrementally Trained, Local Taste Aware, Global D...Anoop Deoras - Building an Incrementally Trained, Local Taste Aware, Global D...
Anoop Deoras - Building an Incrementally Trained, Local Taste Aware, Global D...
 
Vito Ostuni - The Voice: New Challenges in a Zero UI World
Vito Ostuni - The Voice: New Challenges in a Zero UI WorldVito Ostuni - The Voice: New Challenges in a Zero UI World
Vito Ostuni - The Voice: New Challenges in a Zero UI World
 
Anna choromanska - Data-driven Challenges in AI: Scale, Information Selection...
Anna choromanska - Data-driven Challenges in AI: Scale, Information Selection...Anna choromanska - Data-driven Challenges in AI: Scale, Information Selection...
Anna choromanska - Data-driven Challenges in AI: Scale, Information Selection...
 
Janani Kalyanam - Machine Learning to Detect Illegal Online Sales of Prescrip...
Janani Kalyanam - Machine Learning to Detect Illegal Online Sales of Prescrip...Janani Kalyanam - Machine Learning to Detect Illegal Online Sales of Prescrip...
Janani Kalyanam - Machine Learning to Detect Illegal Online Sales of Prescrip...
 
Esperanza Lopez Aguilera - Using a Bayesian Neural Network in the Detection o...
Esperanza Lopez Aguilera - Using a Bayesian Neural Network in the Detection o...Esperanza Lopez Aguilera - Using a Bayesian Neural Network in the Detection o...
Esperanza Lopez Aguilera - Using a Bayesian Neural Network in the Detection o...
 
Neel Sundaresan - Teaching a machine to code
Neel Sundaresan - Teaching a machine to codeNeel Sundaresan - Teaching a machine to code
Neel Sundaresan - Teaching a machine to code
 
Rishabh Mehrotra - Recommendations in a Marketplace: Personalizing Explainabl...
Rishabh Mehrotra - Recommendations in a Marketplace: Personalizing Explainabl...Rishabh Mehrotra - Recommendations in a Marketplace: Personalizing Explainabl...
Rishabh Mehrotra - Recommendations in a Marketplace: Personalizing Explainabl...
 
Soumith Chintala - Increasing the Impact of AI Through Better Software
Soumith Chintala - Increasing the Impact of AI Through Better SoftwareSoumith Chintala - Increasing the Impact of AI Through Better Software
Soumith Chintala - Increasing the Impact of AI Through Better Software
 
Roy Lowrance - Predicting Bond Prices: Regime Changes
Roy Lowrance - Predicting Bond Prices: Regime ChangesRoy Lowrance - Predicting Bond Prices: Regime Changes
Roy Lowrance - Predicting Bond Prices: Regime Changes
 

Recently uploaded

Essentials of Automations: Optimizing FME Workflows with Parameters
Essentials of Automations: Optimizing FME Workflows with ParametersEssentials of Automations: Optimizing FME Workflows with Parameters
Essentials of Automations: Optimizing FME Workflows with Parameters
Safe Software
 

Recently uploaded (20)

Mission to Decommission: Importance of Decommissioning Products to Increase E...
Mission to Decommission: Importance of Decommissioning Products to Increase E...Mission to Decommission: Importance of Decommissioning Products to Increase E...
Mission to Decommission: Importance of Decommissioning Products to Increase E...
 
Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...
Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...
Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...
 
IOS-PENTESTING-BEGINNERS-PRACTICAL-GUIDE-.pptx
IOS-PENTESTING-BEGINNERS-PRACTICAL-GUIDE-.pptxIOS-PENTESTING-BEGINNERS-PRACTICAL-GUIDE-.pptx
IOS-PENTESTING-BEGINNERS-PRACTICAL-GUIDE-.pptx
 
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...
 
From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...
From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...
From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...
 
ODC, Data Fabric and Architecture User Group
ODC, Data Fabric and Architecture User GroupODC, Data Fabric and Architecture User Group
ODC, Data Fabric and Architecture User Group
 
IoT Analytics Company Presentation May 2024
IoT Analytics Company Presentation May 2024IoT Analytics Company Presentation May 2024
IoT Analytics Company Presentation May 2024
 
Powerful Start- the Key to Project Success, Barbara Laskowska
Powerful Start- the Key to Project Success, Barbara LaskowskaPowerful Start- the Key to Project Success, Barbara Laskowska
Powerful Start- the Key to Project Success, Barbara Laskowska
 
Designing Great Products: The Power of Design and Leadership by Chief Designe...
Designing Great Products: The Power of Design and Leadership by Chief Designe...Designing Great Products: The Power of Design and Leadership by Chief Designe...
Designing Great Products: The Power of Design and Leadership by Chief Designe...
 
Knowledge engineering: from people to machines and back
Knowledge engineering: from people to machines and backKnowledge engineering: from people to machines and back
Knowledge engineering: from people to machines and back
 
UiPath Test Automation using UiPath Test Suite series, part 1
UiPath Test Automation using UiPath Test Suite series, part 1UiPath Test Automation using UiPath Test Suite series, part 1
UiPath Test Automation using UiPath Test Suite series, part 1
 
Optimizing NoSQL Performance Through Observability
Optimizing NoSQL Performance Through ObservabilityOptimizing NoSQL Performance Through Observability
Optimizing NoSQL Performance Through Observability
 
To Graph or Not to Graph Knowledge Graph Architectures and LLMs
To Graph or Not to Graph Knowledge Graph Architectures and LLMsTo Graph or Not to Graph Knowledge Graph Architectures and LLMs
To Graph or Not to Graph Knowledge Graph Architectures and LLMs
 
"Impact of front-end architecture on development cost", Viktor Turskyi
"Impact of front-end architecture on development cost", Viktor Turskyi"Impact of front-end architecture on development cost", Viktor Turskyi
"Impact of front-end architecture on development cost", Viktor Turskyi
 
AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...
AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...
AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...
 
Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...
Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...
Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...
 
Assuring Contact Center Experiences for Your Customers With ThousandEyes
Assuring Contact Center Experiences for Your Customers With ThousandEyesAssuring Contact Center Experiences for Your Customers With ThousandEyes
Assuring Contact Center Experiences for Your Customers With ThousandEyes
 
Behind the Scenes From the Manager's Chair: Decoding the Secrets of Successfu...
Behind the Scenes From the Manager's Chair: Decoding the Secrets of Successfu...Behind the Scenes From the Manager's Chair: Decoding the Secrets of Successfu...
Behind the Scenes From the Manager's Chair: Decoding the Secrets of Successfu...
 
Essentials of Automations: Optimizing FME Workflows with Parameters
Essentials of Automations: Optimizing FME Workflows with ParametersEssentials of Automations: Optimizing FME Workflows with Parameters
Essentials of Automations: Optimizing FME Workflows with Parameters
 
UiPath Test Automation using UiPath Test Suite series, part 2
UiPath Test Automation using UiPath Test Suite series, part 2UiPath Test Automation using UiPath Test Suite series, part 2
UiPath Test Automation using UiPath Test Suite series, part 2
 

Renaud bourassa building machine learning models with strict privacy boundaries