3. 3
Recipient recommendation
Ò Given a sender, an email, all possible recipients
(in an enterprise);
Ò Predict which recipient(s) are most likely to
receive the email
4. 4
Why?
Ò Understanding communication in/structure of an
enterprise
Ò Applications in:
Ò enterprise search
Ò expert finding
Ò community detection
Ò spam classification
Ò anomaly detection
5. 5
How?
Ò Gmail
Ò Who do you frequently “co-address”
Ò egonetwork
Ò Related work
Ò Social Network Analysis (SNA)
Ò Email content
Ò Us
Ò SNA + Email content
6. 6
Part 1: Social Network Analysis?
d.p.graus@uva.nl z.ren@uva.nl
derijke@uva.nl
8. 8
SNA for predicting recipients?
1. Importance of a node in the network
More important people are more likely to be the
recipient of an email
2. Strength of connection between two nodes
Given sender of the email, the recipients who are
frequently addressed are more likely to be the recipient
9. 9
SNA for predicting recipients?
1. Importance of a node in the network
1. Number of received emails
2. PageRank score of node
2. Strength of connection between two nodes
1. Number of emails sent between nodes
2. Number of times two nodes are adressed together
10. 10
Part 2: Email content
Ò Statistical Language Models (LMs)
!
Ò Assign a probability to a sequence of words;
Ò Compute models for different corpora;
!
Ò Used in lots of places;
Ò Information Retrieval
Ò Machine Translation
Ò Speech Recognition
13. 13
Language Models
Ò Language models as communication “profiles”
1. Incoming LM (how people talk to user)
2. Outgoing LM (how user talks to people)
14. 14
Language Models
Ò Language models as communication “profiles”
1. Incoming LM (how people talk to user)
2. Outgoing LM (how user talks to people)
3. Interpersonal LM (how node1
talks with node2)
15. 15
Language Models
Ò Language models as communication “profiles”
1. Incoming LM (how people talk to user)
2. Outgoing LM (how user talks to people)
3. Interpersonal LM (how node1
talks with node2)
16. 16
Language Models
Ò Language models as communication “profiles”
1. Incoming LM (how people talk to user)
2. Outgoing LM (how user talks to people)
3. Interpersonal LM (how node1
talks with node2)
4. Corpus LM (how everyone
talks)
17. 17
Why language models?
Ò Comparisons between communication profiles:
Ò Find nodes with most similar communication
18. 18
SNA
!
!
1. Importance of a node
in the network
!
3. Strength of
connection between
nodes
!
!
!
Email Content
!
!
1. Incoming LM
2. Outgoing LM
3. Interpersonal LM
4. Corpus-based LM
20. 20
At some time interval t
Ò Given the email, sender, and network
Ò Remove recipients from email
Ò Rank all nodes in the network
Ò By computing for each candidate (recipient)
node:
1. Importance of candidate
2. Strength of connection between sender and
candidate
3. Similarity between sender and candidate LMs
22. 22
Findings: what works for predicting
recipients?
Ò Importance of node:
Number of received emails of node
!
Ò Strength of connection:
Number of emails between nodes
!
Ò LM Similarity:
Interpersonal LM is most important
23. 23
Findings: SNA vs email content
Ò SNA:
Ò SNA signals deteriorate over time
Ò SNA signals are most informative on highly
active users
!
Ò Email content:
Ò LM signal improves over time
Ò LM signal does worse with highly active users
25. 25
Why for E-Discovery
Ò Anomaly detection
Ò Given a working prediction model; identify
“unexpected” communication
Ò Language models for communication
Ò For a node, find the most different
interpersonal communication
Ò Friends/family vs colleagues?
Ò Find communication that differs from the
corpus-based communication