5. Data at Slack
● Two interesting characteristics that differentiate Slack from other
communication platforms:
1. Within an organization, data is public by default.
2. Across organizations, data is strictly private by default.
6. ● In many traditional communication platforms, including email,
data within an organization is private by default.
“Hello!”
Public by Default
Sender
Recipient
7. Sender
Recipient
Public by Default
● Data in Slack is (mostly) public by default and available to all users
within the organization.
#channel“Hello!”
(Mostly Public)
8. Sender
Public by Default
● Data in Slack is (mostly) public by default and available to all users
within the organization.
#channel“Hello!”
Recipient
(Mostly Public)
9. Public by Default
● What does this mean in the context of Machine Learning?
Lots of public data at the organization level.
○ Gives us a huge source of data to build Machine Learning
models.
○ Makes Machine Learning a valuable tool to help users sift
through the data.
10. Data at Slack
● Two interesting characteristics that differentiate Slack from other
communication platforms:
1. Within an organization, data is public by default.
2. Across organizations, data is strictly private by default.
11. Strict Privacy Boundaries
● Data in Slack should not leak across organizations.
#pizza
#burgers
Organization A Organization B
#cats
#dogs
12. Strict Privacy Boundaries
● Models in Slack should not leak data across organizations.
#cats
#dogs
#pizza
#burgers
Training Topic Model
“Layoffs”
“Company B”“Company B is
planning layoffs”
Bad!
13. Strict Privacy Boundaries
● What does this mean in the context of Machine Learning?
Models should respect the privacy boundaries between
organizations.
○ Models should not leak data explicitly.
○ Models should not leak data implicitly.
16. Learn to Rank
● How do we train this model?
DW
Query
Logs
Click
Logs
(q1
,{d1,1
,d1,2
,…,d1,n
})
(q2
,{d2,1
,d2,2
,…,d2,m
})
…
Model
Training
Model
f(q,d)
17. Learn to Rank
● How do we train this model in a privacy-preserving way?
DW
Query
Logs
Click
Logs
…
#cats
#dogs
#pizza
#burgers
18. Individual Models
● Why not build one model per organization?
○ Sparsity
High dimensional inputs with low coverage within a single
organization.
○ Complexity
Over 500,000 organizations ranging from a few users to
Fortune 500 companies.
19. Global Model
● How can we train a global privacy-preserving model?
○ Attribute Parameterization
Feature transformation technique that factors out private
information and reduces sparsity.
Learning from User Interactions in Personal Search via Attribute
Parameterization (Bendersky et al. 2017)
28. Learn to Rank
● How do we train this model in a privacy-preserving way?
By learning from carefully crafted functions of the high
dimensional attributes of the query and documents, we are able to
factor out the private data and reduce the sparsity of our training
set before it reaches the model.