Understanding Email Traffic (talk @ E-Discovery NL Symposium)

David Graus
David GrausLead Data Scientist
Understanding email traffic
David Graus, University of Amsterdam
d.p.graus@uva.nl
@dvdgrs
2
3
Recipient recommendation
Ò Given a sender, an email, all possible recipients
(in an enterprise);
Ò Predict which recipient(s) are most likely to
receive the email
4
Why?
Ò Understanding communication in/structure of an
enterprise
Ò Applications in:
Ò enterprise search
Ò expert finding
Ò community detection
Ò spam classification
Ò anomaly detection
5
How?
Ò Gmail
Ò Who do you frequently “co-address”
Ò egonetwork
Ò Related work
Ò Social Network Analysis (SNA)
Ò Email content
Ò Us
Ò SNA + Email content
6
Part 1: Social Network Analysis?
d.p.graus@uva.nl z.ren@uva.nl
derijke@uva.nl
7
image by Calvinius - Creative Commons Attribution-Share Alike 3.0
8
SNA for predicting recipients?
1. Importance of a node in the network
More important people are more likely to be the
recipient of an email
2. Strength of connection between two nodes
Given sender of the email, the recipients who are
frequently addressed are more likely to be the recipient
9
SNA for predicting recipients?
1. Importance of a node in the network
1. Number of received emails
2. PageRank score of node
2. Strength of connection between two nodes
1. Number of emails sent between nodes
2. Number of times two nodes are adressed together
10
Part 2: Email content
Ò Statistical Language Models (LMs)
!
Ò Assign a probability to a sequence of words;
Ò Compute models for different corpora;
!
Ò Used in lots of places;
Ò Information Retrieval
Ò Machine Translation
Ò Speech Recognition
11
Language Models
Ò Language models as communication “profiles”
12
Language Models
Ò Language models as communication “profiles”
1. Incoming LM (how people talk to user)
13
Language Models
Ò Language models as communication “profiles”
1. Incoming LM (how people talk to user)
2. Outgoing LM (how user talks to people)
14
Language Models
Ò Language models as communication “profiles”
1. Incoming LM (how people talk to user)
2. Outgoing LM (how user talks to people)
3. Interpersonal LM (how node1 

talks with node2)
15
Language Models
Ò Language models as communication “profiles”
1. Incoming LM (how people talk to user)
2. Outgoing LM (how user talks to people)
3. Interpersonal LM (how node1 

talks with node2)
16
Language Models
Ò Language models as communication “profiles”
1. Incoming LM (how people talk to user)
2. Outgoing LM (how user talks to people)
3. Interpersonal LM (how node1 

talks with node2)
4. Corpus LM (how everyone 

talks)
17
Why language models?
Ò Comparisons between communication profiles:
Ò Find nodes with most similar communication
18
SNA
!
!
1. Importance of a node
in the network
!
3. Strength of
connection between
nodes
!
!
!
Email Content
!
!
1. Incoming LM
2. Outgoing LM
3. Interpersonal LM
4. Corpus-based LM
19
Approach: time-based
t=0 1 email, 2 addresses
t=1 2 emails, 2 addresses
t=2 3 emails, 4 addresses
t=3 4 emails, 5 addresses
!
etc…
!
t=n 607.011 emails, 2.068 addresses
20
At some time interval t
Ò Given the email, sender, and network
Ò Remove recipients from email
Ò Rank all nodes in the network
Ò By computing for each candidate (recipient)
node:
1. Importance of candidate
2. Strength of connection between sender and
candidate
3. Similarity between sender and candidate LMs
21
22
Findings: what works for predicting
recipients?
Ò Importance of node: 

Number of received emails of node
!
Ò Strength of connection: 

Number of emails between nodes
!
Ò LM Similarity: 

Interpersonal LM is most important
23
Findings: SNA vs email content
Ò SNA:
Ò SNA signals deteriorate over time
Ò SNA signals are most informative on highly
active users
!
Ò Email content:
Ò LM signal improves over time
Ò LM signal does worse with highly active users
24
Finally
Ò Combining Social Network Analysis with
Language Modeling is better than doing either.
25
Why for E-Discovery
Ò Anomaly detection
Ò Given a working prediction model; identify
“unexpected” communication
Ò Language models for communication
Ò For a node, find the most different
interpersonal communication
Ò Friends/family vs colleagues?
Ò Find communication that differs from the
corpus-based communication
1 of 25

Recommended

บริการต่างๆบนอินเตอร์เน็ต by
บริการต่างๆบนอินเตอร์เน็ตบริการต่างๆบนอินเตอร์เน็ต
บริการต่างๆบนอินเตอร์เน็ตChanisara Pratchayakul
109 views6 slides
Asymmetric Social Proximity Based Private Matching Protocols for Online Socia... by
Asymmetric Social Proximity Based Private Matching Protocols for Online Socia...Asymmetric Social Proximity Based Private Matching Protocols for Online Socia...
Asymmetric Social Proximity Based Private Matching Protocols for Online Socia...1crore projects
66 views13 slides
Carmine gelormini network analysis by
Carmine gelormini network analysisCarmine gelormini network analysis
Carmine gelormini network analysisCarmineGelormini
21 views7 slides
Data Cleaning for social media knowledge extraction by
Data Cleaning for social media knowledge extractionData Cleaning for social media knowledge extraction
Data Cleaning for social media knowledge extractionMarco Brambilla
1.6K views22 slides
Iterative knowledge extraction from social networks. The Web Conference 2018 by
Iterative knowledge extraction from social networks. The Web Conference 2018Iterative knowledge extraction from social networks. The Web Conference 2018
Iterative knowledge extraction from social networks. The Web Conference 2018Marco Brambilla
1.1K views35 slides
A review of spam filtering and measures of antispam by
A review of spam filtering and measures of antispamA review of spam filtering and measures of antispam
A review of spam filtering and measures of antispamAlexander Decker
624 views4 slides

More Related Content

Viewers also liked

Big Data & Machine Learning - Mogelijkheden & Valkuilen by
Big Data & Machine Learning - Mogelijkheden & ValkuilenBig Data & Machine Learning - Mogelijkheden & Valkuilen
Big Data & Machine Learning - Mogelijkheden & ValkuilenDavid Graus
4.5K views100 slides
yourHistory - entity linking for a personalized timeline of historic events by
yourHistory - entity linking for a personalized timeline of historic eventsyourHistory - entity linking for a personalized timeline of historic events
yourHistory - entity linking for a personalized timeline of historic eventsDavid Graus
3.7K views54 slides
Semantic Annotation of the Cyttron Database by
Semantic Annotation of the Cyttron DatabaseSemantic Annotation of the Cyttron Database
Semantic Annotation of the Cyttron DatabaseDavid Graus
805 views39 slides
Analyzing and Predicting Task Reminders by
Analyzing and Predicting Task RemindersAnalyzing and Predicting Task Reminders
Analyzing and Predicting Task RemindersDavid Graus
415 views73 slides
Dynamic Collective Entity Representations for Entity Ranking by
Dynamic Collective Entity Representations for Entity RankingDynamic Collective Entity Representations for Entity Ranking
Dynamic Collective Entity Representations for Entity RankingDavid Graus
566 views22 slides
Document Classification using the Python Natural Language Toolkit by
Document Classification using the Python Natural Language ToolkitDocument Classification using the Python Natural Language Toolkit
Document Classification using the Python Natural Language ToolkitBen Healey
9.6K views26 slides

Viewers also liked(6)

Big Data & Machine Learning - Mogelijkheden & Valkuilen by David Graus
Big Data & Machine Learning - Mogelijkheden & ValkuilenBig Data & Machine Learning - Mogelijkheden & Valkuilen
Big Data & Machine Learning - Mogelijkheden & Valkuilen
David Graus4.5K views
yourHistory - entity linking for a personalized timeline of historic events by David Graus
yourHistory - entity linking for a personalized timeline of historic eventsyourHistory - entity linking for a personalized timeline of historic events
yourHistory - entity linking for a personalized timeline of historic events
David Graus3.7K views
Semantic Annotation of the Cyttron Database by David Graus
Semantic Annotation of the Cyttron DatabaseSemantic Annotation of the Cyttron Database
Semantic Annotation of the Cyttron Database
David Graus805 views
Analyzing and Predicting Task Reminders by David Graus
Analyzing and Predicting Task RemindersAnalyzing and Predicting Task Reminders
Analyzing and Predicting Task Reminders
David Graus415 views
Dynamic Collective Entity Representations for Entity Ranking by David Graus
Dynamic Collective Entity Representations for Entity RankingDynamic Collective Entity Representations for Entity Ranking
Dynamic Collective Entity Representations for Entity Ranking
David Graus566 views
Document Classification using the Python Natural Language Toolkit by Ben Healey
Document Classification using the Python Natural Language ToolkitDocument Classification using the Python Natural Language Toolkit
Document Classification using the Python Natural Language Toolkit
Ben Healey9.6K views

Similar to Understanding Email Traffic (talk @ E-Discovery NL Symposium)

Mining Email Social Networks by
Mining Email Social NetworksMining Email Social Networks
Mining Email Social Networksarnamoy10
337 views46 slides
MiningEmailSocialNetworks by
MiningEmailSocialNetworksMiningEmailSocialNetworks
MiningEmailSocialNetworkswebuploader
274 views21 slides
EMail.pdf by
EMail.pdfEMail.pdf
EMail.pdfRonald Speener
668 views38 slides
miniproject.ppt.pptx by
miniproject.ppt.pptxminiproject.ppt.pptx
miniproject.ppt.pptxAnush90
88 views12 slides
E -MAIL AND INTERNET by
E -MAIL AND INTERNETE -MAIL AND INTERNET
E -MAIL AND INTERNETProf Ansari
309 views18 slides
David Troy - Presentation at Emerging Communications Conference & Awards (eCo... by
David Troy - Presentation at Emerging Communications Conference & Awards (eCo...David Troy - Presentation at Emerging Communications Conference & Awards (eCo...
David Troy - Presentation at Emerging Communications Conference & Awards (eCo...eCommConf
1.5K views51 slides

Similar to Understanding Email Traffic (talk @ E-Discovery NL Symposium)(20)

Mining Email Social Networks by arnamoy10
Mining Email Social NetworksMining Email Social Networks
Mining Email Social Networks
arnamoy10337 views
MiningEmailSocialNetworks by webuploader
MiningEmailSocialNetworksMiningEmailSocialNetworks
MiningEmailSocialNetworks
webuploader274 views
miniproject.ppt.pptx by Anush90
miniproject.ppt.pptxminiproject.ppt.pptx
miniproject.ppt.pptx
Anush9088 views
E -MAIL AND INTERNET by Prof Ansari
E -MAIL AND INTERNETE -MAIL AND INTERNET
E -MAIL AND INTERNET
Prof Ansari309 views
David Troy - Presentation at Emerging Communications Conference & Awards (eCo... by eCommConf
David Troy - Presentation at Emerging Communications Conference & Awards (eCo...David Troy - Presentation at Emerging Communications Conference & Awards (eCo...
David Troy - Presentation at Emerging Communications Conference & Awards (eCo...
eCommConf1.5K views
Information Systems Security3Information Systems Secur.docx by jaggernaoma
Information Systems Security3Information Systems Secur.docxInformation Systems Security3Information Systems Secur.docx
Information Systems Security3Information Systems Secur.docx
jaggernaoma5 views
E Mail by MrsMoss
E MailE Mail
E Mail
MrsMoss3.3K views
S N A I L Final Presentation by Qiong Wu
S N A I L    Final  PresentationS N A I L    Final  Presentation
S N A I L Final Presentation
Qiong Wu308 views
L26 communication services by heidirobison
L26   communication servicesL26   communication services
L26 communication services
heidirobison1.6K views
Web 2.0: Making Email a Useful Web App by Andy Denmark
Web 2.0: Making Email a Useful Web AppWeb 2.0: Making Email a Useful Web App
Web 2.0: Making Email a Useful Web App
Andy Denmark5.5K views
NACCAP 2010 - Email Marketing for Admissions by TargetX
NACCAP 2010 - Email Marketing for AdmissionsNACCAP 2010 - Email Marketing for Admissions
NACCAP 2010 - Email Marketing for Admissions
TargetX205 views
A Quick Email Etiquette Education by Chelse Benham
A Quick Email Etiquette EducationA Quick Email Etiquette Education
A Quick Email Etiquette Education
Chelse Benham1.4K views
Predicting Communication Intention in Social Media by Charalampos Chelmis
Predicting Communication Intention in Social MediaPredicting Communication Intention in Social Media
Predicting Communication Intention in Social Media
บริการต่างๆบนอินเตอร์เน็ต by Chanisara Pratchayakul
บริการต่างๆบนอินเตอร์เน็ตบริการต่างๆบนอินเตอร์เน็ต
บริการต่างๆบนอินเตอร์เน็ต
Code_Nattakit by Code_16032
Code_NattakitCode_Nattakit
Code_Nattakit
Code_16032237 views

More from David Graus

Pragmatic ethical and fair AI for data scientists by
Pragmatic ethical and fair AI for data scientistsPragmatic ethical and fair AI for data scientists
Pragmatic ethical and fair AI for data scientistsDavid Graus
177 views36 slides
Bias in Recommendations by
Bias in RecommendationsBias in Recommendations
Bias in RecommendationsDavid Graus
2.8K views191 slides
RecSys in the Media Industry: Relevance, Recency, Popularity, and Diversity. by
RecSys in the Media Industry: Relevance, Recency, Popularity, and Diversity.RecSys in the Media Industry: Relevance, Recency, Popularity, and Diversity.
RecSys in the Media Industry: Relevance, Recency, Popularity, and Diversity.David Graus
2.4K views104 slides
CAT/AI: Computer Assisted Translation 
Assessment for Impact by
CAT/AI: Computer Assisted Translation 
Assessment for ImpactCAT/AI: Computer Assisted Translation 
Assessment for Impact
CAT/AI: Computer Assisted Translation 
Assessment for ImpactDavid Graus
208 views60 slides
Opening the Black Box of User Profiles in Content-based Recommender Systems by
Opening the Black Box of User Profiles in Content-based Recommender SystemsOpening the Black Box of User Profiles in Content-based Recommender Systems
Opening the Black Box of User Profiles in Content-based Recommender SystemsDavid Graus
108 views43 slides
Zoeken, vinden, en aanbevelen: personalisatie vs. privacy by
Zoeken, vinden, en aanbevelen: personalisatie vs. privacyZoeken, vinden, en aanbevelen: personalisatie vs. privacy
Zoeken, vinden, en aanbevelen: personalisatie vs. privacyDavid Graus
2.9K views76 slides

More from David Graus(15)

Pragmatic ethical and fair AI for data scientists by David Graus
Pragmatic ethical and fair AI for data scientistsPragmatic ethical and fair AI for data scientists
Pragmatic ethical and fair AI for data scientists
David Graus177 views
Bias in Recommendations by David Graus
Bias in RecommendationsBias in Recommendations
Bias in Recommendations
David Graus2.8K views
RecSys in the Media Industry: Relevance, Recency, Popularity, and Diversity. by David Graus
RecSys in the Media Industry: Relevance, Recency, Popularity, and Diversity.RecSys in the Media Industry: Relevance, Recency, Popularity, and Diversity.
RecSys in the Media Industry: Relevance, Recency, Popularity, and Diversity.
David Graus2.4K views
CAT/AI: Computer Assisted Translation 
Assessment for Impact by David Graus
CAT/AI: Computer Assisted Translation 
Assessment for ImpactCAT/AI: Computer Assisted Translation 
Assessment for Impact
CAT/AI: Computer Assisted Translation 
Assessment for Impact
David Graus208 views
Opening the Black Box of User Profiles in Content-based Recommender Systems by David Graus
Opening the Black Box of User Profiles in Content-based Recommender SystemsOpening the Black Box of User Profiles in Content-based Recommender Systems
Opening the Black Box of User Profiles in Content-based Recommender Systems
David Graus108 views
Zoeken, vinden, en aanbevelen: personalisatie vs. privacy by David Graus
Zoeken, vinden, en aanbevelen: personalisatie vs. privacyZoeken, vinden, en aanbevelen: personalisatie vs. privacy
Zoeken, vinden, en aanbevelen: personalisatie vs. privacy
David Graus2.9K views
Layman's Talk: Entities of Interest --- Discovery in Digital Traces by David Graus
Layman's Talk: Entities of Interest --- Discovery in Digital TracesLayman's Talk: Entities of Interest --- Discovery in Digital Traces
Layman's Talk: Entities of Interest --- Discovery in Digital Traces
David Graus265 views
Financial News Mining @ PyData Amsterdam by David Graus
Financial News Mining @ PyData AmsterdamFinancial News Mining @ PyData Amsterdam
Financial News Mining @ PyData Amsterdam
David Graus748 views
De Macht van Data --- Hoe algoritmen ons leven vormgeven by David Graus
De Macht van Data --- Hoe algoritmen ons leven vormgevenDe Macht van Data --- Hoe algoritmen ons leven vormgeven
De Macht van Data --- Hoe algoritmen ons leven vormgeven
David Graus293 views
Financial News Mining @ FD Mediagroep/Company.info by David Graus
Financial News Mining @ FD Mediagroep/Company.infoFinancial News Mining @ FD Mediagroep/Company.info
Financial News Mining @ FD Mediagroep/Company.info
David Graus2.5K views
Dynamic Collective Entity Representations for Entity Ranking by David Graus
Dynamic Collective Entity Representations for Entity RankingDynamic Collective Entity Representations for Entity Ranking
Dynamic Collective Entity Representations for Entity Ranking
David Graus497 views
Understanding Email Traffic by David Graus
Understanding Email TrafficUnderstanding Email Traffic
Understanding Email Traffic
David Graus530 views
David Graus - Entity Linking (at SEA), Search Engines Amsterdam, Fri June 27th by David Graus
David Graus - Entity Linking (at SEA), Search Engines Amsterdam, Fri June 27thDavid Graus - Entity Linking (at SEA), Search Engines Amsterdam, Fri June 27th
David Graus - Entity Linking (at SEA), Search Engines Amsterdam, Fri June 27th
David Graus554 views
Semantic Search in E-Discovery by David Graus
Semantic Search in E-DiscoverySemantic Search in E-Discovery
Semantic Search in E-Discovery
David Graus939 views
Semantic annotation, clustering and visualization by David Graus
Semantic annotation, clustering and visualizationSemantic annotation, clustering and visualization
Semantic annotation, clustering and visualization
David Graus546 views

Recently uploaded

[DSC Europe 23] Milos Grubjesic Empowering Business with Pepsico s Advanced M... by
[DSC Europe 23] Milos Grubjesic Empowering Business with Pepsico s Advanced M...[DSC Europe 23] Milos Grubjesic Empowering Business with Pepsico s Advanced M...
[DSC Europe 23] Milos Grubjesic Empowering Business with Pepsico s Advanced M...DataScienceConferenc1
6 views11 slides
[DSC Europe 23][AI:CSI] Dragan Pleskonjic - AI Impact on Cybersecurity and P... by
[DSC Europe 23][AI:CSI]  Dragan Pleskonjic - AI Impact on Cybersecurity and P...[DSC Europe 23][AI:CSI]  Dragan Pleskonjic - AI Impact on Cybersecurity and P...
[DSC Europe 23][AI:CSI] Dragan Pleskonjic - AI Impact on Cybersecurity and P...DataScienceConferenc1
6 views36 slides
Chapter 3b- Process Communication (1) (1)(1) (1).pptx by
Chapter 3b- Process Communication (1) (1)(1) (1).pptxChapter 3b- Process Communication (1) (1)(1) (1).pptx
Chapter 3b- Process Communication (1) (1)(1) (1).pptxayeshabaig2004
7 views30 slides
CRIJ4385_Death Penalty_F23.pptx by
CRIJ4385_Death Penalty_F23.pptxCRIJ4385_Death Penalty_F23.pptx
CRIJ4385_Death Penalty_F23.pptxyvettemm100
6 views24 slides
How Leaders See Data? (Level 1) by
How Leaders See Data? (Level 1)How Leaders See Data? (Level 1)
How Leaders See Data? (Level 1)Narendra Narendra
15 views76 slides
Short Story Assignment by Kelly Nguyen by
Short Story Assignment by Kelly NguyenShort Story Assignment by Kelly Nguyen
Short Story Assignment by Kelly Nguyenkellynguyen01
19 views17 slides

Recently uploaded(20)

[DSC Europe 23] Milos Grubjesic Empowering Business with Pepsico s Advanced M... by DataScienceConferenc1
[DSC Europe 23] Milos Grubjesic Empowering Business with Pepsico s Advanced M...[DSC Europe 23] Milos Grubjesic Empowering Business with Pepsico s Advanced M...
[DSC Europe 23] Milos Grubjesic Empowering Business with Pepsico s Advanced M...
[DSC Europe 23][AI:CSI] Dragan Pleskonjic - AI Impact on Cybersecurity and P... by DataScienceConferenc1
[DSC Europe 23][AI:CSI]  Dragan Pleskonjic - AI Impact on Cybersecurity and P...[DSC Europe 23][AI:CSI]  Dragan Pleskonjic - AI Impact on Cybersecurity and P...
[DSC Europe 23][AI:CSI] Dragan Pleskonjic - AI Impact on Cybersecurity and P...
Chapter 3b- Process Communication (1) (1)(1) (1).pptx by ayeshabaig2004
Chapter 3b- Process Communication (1) (1)(1) (1).pptxChapter 3b- Process Communication (1) (1)(1) (1).pptx
Chapter 3b- Process Communication (1) (1)(1) (1).pptx
ayeshabaig20047 views
CRIJ4385_Death Penalty_F23.pptx by yvettemm100
CRIJ4385_Death Penalty_F23.pptxCRIJ4385_Death Penalty_F23.pptx
CRIJ4385_Death Penalty_F23.pptx
yvettemm1006 views
Short Story Assignment by Kelly Nguyen by kellynguyen01
Short Story Assignment by Kelly NguyenShort Story Assignment by Kelly Nguyen
Short Story Assignment by Kelly Nguyen
kellynguyen0119 views
Survey on Factuality in LLM's.pptx by NeethaSherra1
Survey on Factuality in LLM's.pptxSurvey on Factuality in LLM's.pptx
Survey on Factuality in LLM's.pptx
NeethaSherra17 views
[DSC Europe 23] Spela Poklukar & Tea Brasanac - Retrieval Augmented Generation by DataScienceConferenc1
[DSC Europe 23] Spela Poklukar & Tea Brasanac - Retrieval Augmented Generation[DSC Europe 23] Spela Poklukar & Tea Brasanac - Retrieval Augmented Generation
[DSC Europe 23] Spela Poklukar & Tea Brasanac - Retrieval Augmented Generation
[DSC Europe 23] Zsolt Feleki - Machine Translation should we trust it.pptx by DataScienceConferenc1
[DSC Europe 23] Zsolt Feleki - Machine Translation should we trust it.pptx[DSC Europe 23] Zsolt Feleki - Machine Translation should we trust it.pptx
[DSC Europe 23] Zsolt Feleki - Machine Translation should we trust it.pptx
Organic Shopping in Google Analytics 4.pdf by GA4 Tutorials
Organic Shopping in Google Analytics 4.pdfOrganic Shopping in Google Analytics 4.pdf
Organic Shopping in Google Analytics 4.pdf
GA4 Tutorials16 views
PRIVACY AWRE PERSONAL DATA STORAGE by antony420421
PRIVACY AWRE PERSONAL DATA STORAGEPRIVACY AWRE PERSONAL DATA STORAGE
PRIVACY AWRE PERSONAL DATA STORAGE
antony4204215 views
Data about the sector workshop by info828217
Data about the sector workshopData about the sector workshop
Data about the sector workshop
info82821712 views
Cross-network in Google Analytics 4.pdf by GA4 Tutorials
Cross-network in Google Analytics 4.pdfCross-network in Google Analytics 4.pdf
Cross-network in Google Analytics 4.pdf
GA4 Tutorials6 views
3196 The Case of The East River by ErickANDRADE90
3196 The Case of The East River3196 The Case of The East River
3196 The Case of The East River
ErickANDRADE9016 views
CRM stick or twist.pptx by info828217
CRM stick or twist.pptxCRM stick or twist.pptx
CRM stick or twist.pptx
info82821711 views
Advanced_Recommendation_Systems_Presentation.pptx by neeharikasingh29
Advanced_Recommendation_Systems_Presentation.pptxAdvanced_Recommendation_Systems_Presentation.pptx
Advanced_Recommendation_Systems_Presentation.pptx
SUPER STORE SQL PROJECT.pptx by khan888620
SUPER STORE SQL PROJECT.pptxSUPER STORE SQL PROJECT.pptx
SUPER STORE SQL PROJECT.pptx
khan88862013 views

Understanding Email Traffic (talk @ E-Discovery NL Symposium)

  • 1. Understanding email traffic David Graus, University of Amsterdam d.p.graus@uva.nl @dvdgrs
  • 2. 2
  • 3. 3 Recipient recommendation Ò Given a sender, an email, all possible recipients (in an enterprise); Ò Predict which recipient(s) are most likely to receive the email
  • 4. 4 Why? Ò Understanding communication in/structure of an enterprise Ò Applications in: Ò enterprise search Ò expert finding Ò community detection Ò spam classification Ò anomaly detection
  • 5. 5 How? Ò Gmail Ò Who do you frequently “co-address” Ò egonetwork Ò Related work Ò Social Network Analysis (SNA) Ò Email content Ò Us Ò SNA + Email content
  • 6. 6 Part 1: Social Network Analysis? d.p.graus@uva.nl z.ren@uva.nl derijke@uva.nl
  • 7. 7 image by Calvinius - Creative Commons Attribution-Share Alike 3.0
  • 8. 8 SNA for predicting recipients? 1. Importance of a node in the network More important people are more likely to be the recipient of an email 2. Strength of connection between two nodes Given sender of the email, the recipients who are frequently addressed are more likely to be the recipient
  • 9. 9 SNA for predicting recipients? 1. Importance of a node in the network 1. Number of received emails 2. PageRank score of node 2. Strength of connection between two nodes 1. Number of emails sent between nodes 2. Number of times two nodes are adressed together
  • 10. 10 Part 2: Email content Ò Statistical Language Models (LMs) ! Ò Assign a probability to a sequence of words; Ò Compute models for different corpora; ! Ò Used in lots of places; Ò Information Retrieval Ò Machine Translation Ò Speech Recognition
  • 11. 11 Language Models Ò Language models as communication “profiles”
  • 12. 12 Language Models Ò Language models as communication “profiles” 1. Incoming LM (how people talk to user)
  • 13. 13 Language Models Ò Language models as communication “profiles” 1. Incoming LM (how people talk to user) 2. Outgoing LM (how user talks to people)
  • 14. 14 Language Models Ò Language models as communication “profiles” 1. Incoming LM (how people talk to user) 2. Outgoing LM (how user talks to people) 3. Interpersonal LM (how node1 
 talks with node2)
  • 15. 15 Language Models Ò Language models as communication “profiles” 1. Incoming LM (how people talk to user) 2. Outgoing LM (how user talks to people) 3. Interpersonal LM (how node1 
 talks with node2)
  • 16. 16 Language Models Ò Language models as communication “profiles” 1. Incoming LM (how people talk to user) 2. Outgoing LM (how user talks to people) 3. Interpersonal LM (how node1 
 talks with node2) 4. Corpus LM (how everyone 
 talks)
  • 17. 17 Why language models? Ò Comparisons between communication profiles: Ò Find nodes with most similar communication
  • 18. 18 SNA ! ! 1. Importance of a node in the network ! 3. Strength of connection between nodes ! ! ! Email Content ! ! 1. Incoming LM 2. Outgoing LM 3. Interpersonal LM 4. Corpus-based LM
  • 19. 19 Approach: time-based t=0 1 email, 2 addresses t=1 2 emails, 2 addresses t=2 3 emails, 4 addresses t=3 4 emails, 5 addresses ! etc… ! t=n 607.011 emails, 2.068 addresses
  • 20. 20 At some time interval t Ò Given the email, sender, and network Ò Remove recipients from email Ò Rank all nodes in the network Ò By computing for each candidate (recipient) node: 1. Importance of candidate 2. Strength of connection between sender and candidate 3. Similarity between sender and candidate LMs
  • 21. 21
  • 22. 22 Findings: what works for predicting recipients? Ò Importance of node: 
 Number of received emails of node ! Ò Strength of connection: 
 Number of emails between nodes ! Ò LM Similarity: 
 Interpersonal LM is most important
  • 23. 23 Findings: SNA vs email content Ò SNA: Ò SNA signals deteriorate over time Ò SNA signals are most informative on highly active users ! Ò Email content: Ò LM signal improves over time Ò LM signal does worse with highly active users
  • 24. 24 Finally Ò Combining Social Network Analysis with Language Modeling is better than doing either.
  • 25. 25 Why for E-Discovery Ò Anomaly detection Ò Given a working prediction model; identify “unexpected” communication Ò Language models for communication Ò For a node, find the most different interpersonal communication Ò Friends/family vs colleagues? Ò Find communication that differs from the corpus-based communication