SlideShare a Scribd company logo
Understanding email traffic
David Graus, University of Amsterdam
d.p.graus@uva.nl
@dvdgrs
2
3
Recipient recommendation
Ò Given a sender, an email, all possible recipients
(in an enterprise);
Ò Predict which recipient(s) are most likely to
receive the email
4
Why?
Ò Understanding communication in/structure of an
enterprise
Ò Applications in:
Ò enterprise search
Ò expert finding
Ò community detection
Ò spam classification
Ò anomaly detection
5
How?
Ò Gmail
Ò Who do you frequently “co-address”
Ò egonetwork
Ò Related work
Ò Social Network Analysis (SNA)
Ò Email content
Ò Us
Ò SNA + Email content
6
Part 1: Social Network Analysis?
d.p.graus@uva.nl z.ren@uva.nl
derijke@uva.nl
7
image by Calvinius - Creative Commons Attribution-Share Alike 3.0
8
SNA for predicting recipients?
1. Importance of a node in the network
More important people are more likely to be the
recipient of an email
2. Strength of connection between two nodes
Given sender of the email, the recipients who are
frequently addressed are more likely to be the recipient
9
SNA for predicting recipients?
1. Importance of a node in the network
1. Number of received emails
2. PageRank score of node
2. Strength of connection between two nodes
1. Number of emails sent between nodes
2. Number of times two nodes are adressed together
10
Part 2: Email content
Ò Statistical Language Models (LMs)
!
Ò Assign a probability to a sequence of words;
Ò Compute models for different corpora;
!
Ò Used in lots of places;
Ò Information Retrieval
Ò Machine Translation
Ò Speech Recognition
11
Language Models
Ò Language models as communication “profiles”
12
Language Models
Ò Language models as communication “profiles”
1. Incoming LM (how people talk to user)
13
Language Models
Ò Language models as communication “profiles”
1. Incoming LM (how people talk to user)
2. Outgoing LM (how user talks to people)
14
Language Models
Ò Language models as communication “profiles”
1. Incoming LM (how people talk to user)
2. Outgoing LM (how user talks to people)
3. Interpersonal LM (how node1 

talks with node2)
15
Language Models
Ò Language models as communication “profiles”
1. Incoming LM (how people talk to user)
2. Outgoing LM (how user talks to people)
3. Interpersonal LM (how node1 

talks with node2)
16
Language Models
Ò Language models as communication “profiles”
1. Incoming LM (how people talk to user)
2. Outgoing LM (how user talks to people)
3. Interpersonal LM (how node1 

talks with node2)
4. Corpus LM (how everyone 

talks)
17
Why language models?
Ò Comparisons between communication profiles:
Ò Find nodes with most similar communication
18
SNA
!
!
1. Importance of a node
in the network
!
3. Strength of
connection between
nodes
!
!
!
Email Content
!
!
1. Incoming LM
2. Outgoing LM
3. Interpersonal LM
4. Corpus-based LM
19
Approach: time-based
t=0 1 email, 2 addresses
t=1 2 emails, 2 addresses
t=2 3 emails, 4 addresses
t=3 4 emails, 5 addresses
!
etc…
!
t=n 607.011 emails, 2.068 addresses
20
At some time interval t
Ò Given the email, sender, and network
Ò Remove recipients from email
Ò Rank all nodes in the network
Ò By computing for each candidate (recipient)
node:
1. Importance of candidate
2. Strength of connection between sender and
candidate
3. Similarity between sender and candidate LMs
21
22
Findings: what works for predicting
recipients?
Ò Importance of node: 

Number of received emails of node
!
Ò Strength of connection: 

Number of emails between nodes
!
Ò LM Similarity: 

Interpersonal LM is most important
23
Findings: SNA vs email content
Ò SNA:
Ò SNA signals deteriorate over time
Ò SNA signals are most informative on highly
active users
!
Ò Email content:
Ò LM signal improves over time
Ò LM signal does worse with highly active users
24
Finally
Ò Combining Social Network Analysis with
Language Modeling is better than doing either.
25
Why for E-Discovery
Ò Anomaly detection
Ò Given a working prediction model; identify
“unexpected” communication
Ò Language models for communication
Ò For a node, find the most different
interpersonal communication
Ò Friends/family vs colleagues?
Ò Find communication that differs from the
corpus-based communication

More Related Content

Viewers also liked

Big Data & Machine Learning - Mogelijkheden & Valkuilen
Big Data & Machine Learning - Mogelijkheden & ValkuilenBig Data & Machine Learning - Mogelijkheden & Valkuilen
Big Data & Machine Learning - Mogelijkheden & Valkuilen
David Graus
 
yourHistory - entity linking for a personalized timeline of historic events
yourHistory - entity linking for a personalized timeline of historic eventsyourHistory - entity linking for a personalized timeline of historic events
yourHistory - entity linking for a personalized timeline of historic events
David Graus
 
Semantic Annotation of the Cyttron Database
Semantic Annotation of the Cyttron DatabaseSemantic Annotation of the Cyttron Database
Semantic Annotation of the Cyttron Database
David Graus
 
Analyzing and Predicting Task Reminders
Analyzing and Predicting Task RemindersAnalyzing and Predicting Task Reminders
Analyzing and Predicting Task Reminders
David Graus
 
Dynamic Collective Entity Representations for Entity Ranking
Dynamic Collective Entity Representations for Entity RankingDynamic Collective Entity Representations for Entity Ranking
Dynamic Collective Entity Representations for Entity Ranking
David Graus
 
Document Classification using the Python Natural Language Toolkit
Document Classification using the Python Natural Language ToolkitDocument Classification using the Python Natural Language Toolkit
Document Classification using the Python Natural Language ToolkitBen Healey
 

Viewers also liked (6)

Big Data & Machine Learning - Mogelijkheden & Valkuilen
Big Data & Machine Learning - Mogelijkheden & ValkuilenBig Data & Machine Learning - Mogelijkheden & Valkuilen
Big Data & Machine Learning - Mogelijkheden & Valkuilen
 
yourHistory - entity linking for a personalized timeline of historic events
yourHistory - entity linking for a personalized timeline of historic eventsyourHistory - entity linking for a personalized timeline of historic events
yourHistory - entity linking for a personalized timeline of historic events
 
Semantic Annotation of the Cyttron Database
Semantic Annotation of the Cyttron DatabaseSemantic Annotation of the Cyttron Database
Semantic Annotation of the Cyttron Database
 
Analyzing and Predicting Task Reminders
Analyzing and Predicting Task RemindersAnalyzing and Predicting Task Reminders
Analyzing and Predicting Task Reminders
 
Dynamic Collective Entity Representations for Entity Ranking
Dynamic Collective Entity Representations for Entity RankingDynamic Collective Entity Representations for Entity Ranking
Dynamic Collective Entity Representations for Entity Ranking
 
Document Classification using the Python Natural Language Toolkit
Document Classification using the Python Natural Language ToolkitDocument Classification using the Python Natural Language Toolkit
Document Classification using the Python Natural Language Toolkit
 

Similar to Understanding Email Traffic (talk @ E-Discovery NL Symposium)

Mining Email Social Networks
Mining Email Social NetworksMining Email Social Networks
Mining Email Social Networksarnamoy10
 
MiningEmailSocialNetworks
MiningEmailSocialNetworksMiningEmailSocialNetworks
MiningEmailSocialNetworkswebuploader
 
miniproject.ppt.pptx
miniproject.ppt.pptxminiproject.ppt.pptx
miniproject.ppt.pptx
Anush90
 
E -MAIL AND INTERNET
E -MAIL AND INTERNETE -MAIL AND INTERNET
E -MAIL AND INTERNET
Prof Ansari
 
David Troy - Presentation at Emerging Communications Conference & Awards (eCo...
David Troy - Presentation at Emerging Communications Conference & Awards (eCo...David Troy - Presentation at Emerging Communications Conference & Awards (eCo...
David Troy - Presentation at Emerging Communications Conference & Awards (eCo...
eCommConf
 
Email
EmailEmail
Email
Roy Thomas
 
Information Systems Security3Information Systems Secur.docx
Information Systems Security3Information Systems Secur.docxInformation Systems Security3Information Systems Secur.docx
Information Systems Security3Information Systems Secur.docx
jaggernaoma
 
S N A I L Final Presentation
S N A I L    Final  PresentationS N A I L    Final  Presentation
S N A I L Final PresentationQiong Wu
 
L26 communication services
L26   communication servicesL26   communication services
L26 communication servicesheidirobison
 
Web 2.0: Making Email a Useful Web App
Web 2.0: Making Email a Useful Web AppWeb 2.0: Making Email a Useful Web App
Web 2.0: Making Email a Useful Web App
Andy Denmark
 
NACCAP 2010 - Email Marketing for Admissions
NACCAP 2010 - Email Marketing for AdmissionsNACCAP 2010 - Email Marketing for Admissions
NACCAP 2010 - Email Marketing for AdmissionsTargetX
 
A Quick Email Etiquette Education
A Quick Email Etiquette EducationA Quick Email Etiquette Education
A Quick Email Etiquette Education
Chelse Benham
 
Predicting Communication Intention in Social Media
Predicting Communication Intention in Social MediaPredicting Communication Intention in Social Media
Predicting Communication Intention in Social Media
Charalampos Chelmis
 
บริการต่างๆบนอินเตอร์เน็ต
บริการต่างๆบนอินเตอร์เน็ตบริการต่างๆบนอินเตอร์เน็ต
บริการต่างๆบนอินเตอร์เน็ตChanisara Pratchayakul
 

Similar to Understanding Email Traffic (talk @ E-Discovery NL Symposium) (20)

Mining Email Social Networks
Mining Email Social NetworksMining Email Social Networks
Mining Email Social Networks
 
MiningEmailSocialNetworks
MiningEmailSocialNetworksMiningEmailSocialNetworks
MiningEmailSocialNetworks
 
EMail.pdf
EMail.pdfEMail.pdf
EMail.pdf
 
miniproject.ppt.pptx
miniproject.ppt.pptxminiproject.ppt.pptx
miniproject.ppt.pptx
 
E -MAIL AND INTERNET
E -MAIL AND INTERNETE -MAIL AND INTERNET
E -MAIL AND INTERNET
 
David Troy - Presentation at Emerging Communications Conference & Awards (eCo...
David Troy - Presentation at Emerging Communications Conference & Awards (eCo...David Troy - Presentation at Emerging Communications Conference & Awards (eCo...
David Troy - Presentation at Emerging Communications Conference & Awards (eCo...
 
Email
EmailEmail
Email
 
hwk1
hwk1hwk1
hwk1
 
hwk1
hwk1hwk1
hwk1
 
Information Systems Security3Information Systems Secur.docx
Information Systems Security3Information Systems Secur.docxInformation Systems Security3Information Systems Secur.docx
Information Systems Security3Information Systems Secur.docx
 
คอม
คอมคอม
คอม
 
E Mail
E MailE Mail
E Mail
 
S N A I L Final Presentation
S N A I L    Final  PresentationS N A I L    Final  Presentation
S N A I L Final Presentation
 
L26 communication services
L26   communication servicesL26   communication services
L26 communication services
 
Web 2.0: Making Email a Useful Web App
Web 2.0: Making Email a Useful Web AppWeb 2.0: Making Email a Useful Web App
Web 2.0: Making Email a Useful Web App
 
NACCAP 2010 - Email Marketing for Admissions
NACCAP 2010 - Email Marketing for AdmissionsNACCAP 2010 - Email Marketing for Admissions
NACCAP 2010 - Email Marketing for Admissions
 
A Quick Email Etiquette Education
A Quick Email Etiquette EducationA Quick Email Etiquette Education
A Quick Email Etiquette Education
 
Predicting Communication Intention in Social Media
Predicting Communication Intention in Social MediaPredicting Communication Intention in Social Media
Predicting Communication Intention in Social Media
 
บริการต่างๆบนอินเตอร์เน็ต
บริการต่างๆบนอินเตอร์เน็ตบริการต่างๆบนอินเตอร์เน็ต
บริการต่างๆบนอินเตอร์เน็ต
 
Code_Nattakit
Code_NattakitCode_Nattakit
Code_Nattakit
 

More from David Graus

Pragmatic ethical and fair AI for data scientists
Pragmatic ethical and fair AI for data scientistsPragmatic ethical and fair AI for data scientists
Pragmatic ethical and fair AI for data scientists
David Graus
 
Bias in Recommendations
Bias in RecommendationsBias in Recommendations
Bias in Recommendations
David Graus
 
RecSys in the Media Industry: Relevance, Recency, Popularity, and Diversity.
RecSys in the Media Industry: Relevance, Recency, Popularity, and Diversity.RecSys in the Media Industry: Relevance, Recency, Popularity, and Diversity.
RecSys in the Media Industry: Relevance, Recency, Popularity, and Diversity.
David Graus
 
CAT/AI: Computer Assisted Translation 
Assessment for Impact
CAT/AI: Computer Assisted Translation 
Assessment for ImpactCAT/AI: Computer Assisted Translation 
Assessment for Impact
CAT/AI: Computer Assisted Translation 
Assessment for Impact
David Graus
 
Opening the Black Box of User Profiles in Content-based Recommender Systems
Opening the Black Box of User Profiles in Content-based Recommender SystemsOpening the Black Box of User Profiles in Content-based Recommender Systems
Opening the Black Box of User Profiles in Content-based Recommender Systems
David Graus
 
Zoeken, vinden, en aanbevelen: personalisatie vs. privacy
Zoeken, vinden, en aanbevelen: personalisatie vs. privacyZoeken, vinden, en aanbevelen: personalisatie vs. privacy
Zoeken, vinden, en aanbevelen: personalisatie vs. privacy
David Graus
 
Layman's Talk: Entities of Interest --- Discovery in Digital Traces
Layman's Talk: Entities of Interest --- Discovery in Digital TracesLayman's Talk: Entities of Interest --- Discovery in Digital Traces
Layman's Talk: Entities of Interest --- Discovery in Digital Traces
David Graus
 
Financial News Mining @ PyData Amsterdam
Financial News Mining @ PyData AmsterdamFinancial News Mining @ PyData Amsterdam
Financial News Mining @ PyData Amsterdam
David Graus
 
De Macht van Data --- Hoe algoritmen ons leven vormgeven
De Macht van Data --- Hoe algoritmen ons leven vormgevenDe Macht van Data --- Hoe algoritmen ons leven vormgeven
De Macht van Data --- Hoe algoritmen ons leven vormgeven
David Graus
 
Financial News Mining @ FD Mediagroep/Company.info
Financial News Mining @ FD Mediagroep/Company.infoFinancial News Mining @ FD Mediagroep/Company.info
Financial News Mining @ FD Mediagroep/Company.info
David Graus
 
Dynamic Collective Entity Representations for Entity Ranking
Dynamic Collective Entity Representations for Entity RankingDynamic Collective Entity Representations for Entity Ranking
Dynamic Collective Entity Representations for Entity Ranking
David Graus
 
Understanding Email Traffic
Understanding Email TrafficUnderstanding Email Traffic
Understanding Email Traffic
David Graus
 
David Graus - Entity Linking (at SEA), Search Engines Amsterdam, Fri June 27th
David Graus - Entity Linking (at SEA), Search Engines Amsterdam, Fri June 27thDavid Graus - Entity Linking (at SEA), Search Engines Amsterdam, Fri June 27th
David Graus - Entity Linking (at SEA), Search Engines Amsterdam, Fri June 27thDavid Graus
 
Semantic Search in E-Discovery
Semantic Search in E-DiscoverySemantic Search in E-Discovery
Semantic Search in E-DiscoveryDavid Graus
 
Semantic annotation, clustering and visualization
Semantic annotation, clustering and visualizationSemantic annotation, clustering and visualization
Semantic annotation, clustering and visualization
David Graus
 

More from David Graus (15)

Pragmatic ethical and fair AI for data scientists
Pragmatic ethical and fair AI for data scientistsPragmatic ethical and fair AI for data scientists
Pragmatic ethical and fair AI for data scientists
 
Bias in Recommendations
Bias in RecommendationsBias in Recommendations
Bias in Recommendations
 
RecSys in the Media Industry: Relevance, Recency, Popularity, and Diversity.
RecSys in the Media Industry: Relevance, Recency, Popularity, and Diversity.RecSys in the Media Industry: Relevance, Recency, Popularity, and Diversity.
RecSys in the Media Industry: Relevance, Recency, Popularity, and Diversity.
 
CAT/AI: Computer Assisted Translation 
Assessment for Impact
CAT/AI: Computer Assisted Translation 
Assessment for ImpactCAT/AI: Computer Assisted Translation 
Assessment for Impact
CAT/AI: Computer Assisted Translation 
Assessment for Impact
 
Opening the Black Box of User Profiles in Content-based Recommender Systems
Opening the Black Box of User Profiles in Content-based Recommender SystemsOpening the Black Box of User Profiles in Content-based Recommender Systems
Opening the Black Box of User Profiles in Content-based Recommender Systems
 
Zoeken, vinden, en aanbevelen: personalisatie vs. privacy
Zoeken, vinden, en aanbevelen: personalisatie vs. privacyZoeken, vinden, en aanbevelen: personalisatie vs. privacy
Zoeken, vinden, en aanbevelen: personalisatie vs. privacy
 
Layman's Talk: Entities of Interest --- Discovery in Digital Traces
Layman's Talk: Entities of Interest --- Discovery in Digital TracesLayman's Talk: Entities of Interest --- Discovery in Digital Traces
Layman's Talk: Entities of Interest --- Discovery in Digital Traces
 
Financial News Mining @ PyData Amsterdam
Financial News Mining @ PyData AmsterdamFinancial News Mining @ PyData Amsterdam
Financial News Mining @ PyData Amsterdam
 
De Macht van Data --- Hoe algoritmen ons leven vormgeven
De Macht van Data --- Hoe algoritmen ons leven vormgevenDe Macht van Data --- Hoe algoritmen ons leven vormgeven
De Macht van Data --- Hoe algoritmen ons leven vormgeven
 
Financial News Mining @ FD Mediagroep/Company.info
Financial News Mining @ FD Mediagroep/Company.infoFinancial News Mining @ FD Mediagroep/Company.info
Financial News Mining @ FD Mediagroep/Company.info
 
Dynamic Collective Entity Representations for Entity Ranking
Dynamic Collective Entity Representations for Entity RankingDynamic Collective Entity Representations for Entity Ranking
Dynamic Collective Entity Representations for Entity Ranking
 
Understanding Email Traffic
Understanding Email TrafficUnderstanding Email Traffic
Understanding Email Traffic
 
David Graus - Entity Linking (at SEA), Search Engines Amsterdam, Fri June 27th
David Graus - Entity Linking (at SEA), Search Engines Amsterdam, Fri June 27thDavid Graus - Entity Linking (at SEA), Search Engines Amsterdam, Fri June 27th
David Graus - Entity Linking (at SEA), Search Engines Amsterdam, Fri June 27th
 
Semantic Search in E-Discovery
Semantic Search in E-DiscoverySemantic Search in E-Discovery
Semantic Search in E-Discovery
 
Semantic annotation, clustering and visualization
Semantic annotation, clustering and visualizationSemantic annotation, clustering and visualization
Semantic annotation, clustering and visualization
 

Recently uploaded

Machine learning and optimization techniques for electrical drives.pptx
Machine learning and optimization techniques for electrical drives.pptxMachine learning and optimization techniques for electrical drives.pptx
Machine learning and optimization techniques for electrical drives.pptx
balafet
 
一比一原版(Adelaide毕业证书)阿德莱德大学毕业证如何办理
一比一原版(Adelaide毕业证书)阿德莱德大学毕业证如何办理一比一原版(Adelaide毕业证书)阿德莱德大学毕业证如何办理
一比一原版(Adelaide毕业证书)阿德莱德大学毕业证如何办理
slg6lamcq
 
一比一原版(爱大毕业证书)爱丁堡大学毕业证如何办理
一比一原版(爱大毕业证书)爱丁堡大学毕业证如何办理一比一原版(爱大毕业证书)爱丁堡大学毕业证如何办理
一比一原版(爱大毕业证书)爱丁堡大学毕业证如何办理
g4dpvqap0
 
原版制作(Deakin毕业证书)迪肯大学毕业证学位证一模一样
原版制作(Deakin毕业证书)迪肯大学毕业证学位证一模一样原版制作(Deakin毕业证书)迪肯大学毕业证学位证一模一样
原版制作(Deakin毕业证书)迪肯大学毕业证学位证一模一样
u86oixdj
 
一比一原版(NYU毕业证)纽约大学毕业证成绩单
一比一原版(NYU毕业证)纽约大学毕业证成绩单一比一原版(NYU毕业证)纽约大学毕业证成绩单
一比一原版(NYU毕业证)纽约大学毕业证成绩单
ewymefz
 
Malana- Gimlet Market Analysis (Portfolio 2)
Malana- Gimlet Market Analysis (Portfolio 2)Malana- Gimlet Market Analysis (Portfolio 2)
Malana- Gimlet Market Analysis (Portfolio 2)
TravisMalana
 
一比一原版(Deakin毕业证书)迪肯大学毕业证如何办理
一比一原版(Deakin毕业证书)迪肯大学毕业证如何办理一比一原版(Deakin毕业证书)迪肯大学毕业证如何办理
一比一原版(Deakin毕业证书)迪肯大学毕业证如何办理
oz8q3jxlp
 
一比一原版(Bradford毕业证书)布拉德福德大学毕业证如何办理
一比一原版(Bradford毕业证书)布拉德福德大学毕业证如何办理一比一原版(Bradford毕业证书)布拉德福德大学毕业证如何办理
一比一原版(Bradford毕业证书)布拉德福德大学毕业证如何办理
mbawufebxi
 
Criminal IP - Threat Hunting Webinar.pdf
Criminal IP - Threat Hunting Webinar.pdfCriminal IP - Threat Hunting Webinar.pdf
Criminal IP - Threat Hunting Webinar.pdf
Criminal IP
 
The Building Blocks of QuestDB, a Time Series Database
The Building Blocks of QuestDB, a Time Series DatabaseThe Building Blocks of QuestDB, a Time Series Database
The Building Blocks of QuestDB, a Time Series Database
javier ramirez
 
Influence of Marketing Strategy and Market Competition on Business Plan
Influence of Marketing Strategy and Market Competition on Business PlanInfluence of Marketing Strategy and Market Competition on Business Plan
Influence of Marketing Strategy and Market Competition on Business Plan
jerlynmaetalle
 
Adjusting OpenMP PageRank : SHORT REPORT / NOTES
Adjusting OpenMP PageRank : SHORT REPORT / NOTESAdjusting OpenMP PageRank : SHORT REPORT / NOTES
Adjusting OpenMP PageRank : SHORT REPORT / NOTES
Subhajit Sahu
 
一比一原版(UPenn毕业证)宾夕法尼亚大学毕业证成绩单
一比一原版(UPenn毕业证)宾夕法尼亚大学毕业证成绩单一比一原版(UPenn毕业证)宾夕法尼亚大学毕业证成绩单
一比一原版(UPenn毕业证)宾夕法尼亚大学毕业证成绩单
ewymefz
 
Q1’2024 Update: MYCI’s Leap Year Rebound
Q1’2024 Update: MYCI’s Leap Year ReboundQ1’2024 Update: MYCI’s Leap Year Rebound
Q1’2024 Update: MYCI’s Leap Year Rebound
Oppotus
 
Adjusting primitives for graph : SHORT REPORT / NOTES
Adjusting primitives for graph : SHORT REPORT / NOTESAdjusting primitives for graph : SHORT REPORT / NOTES
Adjusting primitives for graph : SHORT REPORT / NOTES
Subhajit Sahu
 
Algorithmic optimizations for Dynamic Levelwise PageRank (from STICD) : SHORT...
Algorithmic optimizations for Dynamic Levelwise PageRank (from STICD) : SHORT...Algorithmic optimizations for Dynamic Levelwise PageRank (from STICD) : SHORT...
Algorithmic optimizations for Dynamic Levelwise PageRank (from STICD) : SHORT...
Subhajit Sahu
 
Best best suvichar in gujarati english meaning of this sentence as Silk road ...
Best best suvichar in gujarati english meaning of this sentence as Silk road ...Best best suvichar in gujarati english meaning of this sentence as Silk road ...
Best best suvichar in gujarati english meaning of this sentence as Silk road ...
AbhimanyuSinha9
 
原版制作(swinburne毕业证书)斯威本科技大学毕业证毕业完成信一模一样
原版制作(swinburne毕业证书)斯威本科技大学毕业证毕业完成信一模一样原版制作(swinburne毕业证书)斯威本科技大学毕业证毕业完成信一模一样
原版制作(swinburne毕业证书)斯威本科技大学毕业证毕业完成信一模一样
u86oixdj
 
Data Centers - Striving Within A Narrow Range - Research Report - MCG - May 2...
Data Centers - Striving Within A Narrow Range - Research Report - MCG - May 2...Data Centers - Striving Within A Narrow Range - Research Report - MCG - May 2...
Data Centers - Striving Within A Narrow Range - Research Report - MCG - May 2...
pchutichetpong
 
The affect of service quality and online reviews on customer loyalty in the E...
The affect of service quality and online reviews on customer loyalty in the E...The affect of service quality and online reviews on customer loyalty in the E...
The affect of service quality and online reviews on customer loyalty in the E...
jerlynmaetalle
 

Recently uploaded (20)

Machine learning and optimization techniques for electrical drives.pptx
Machine learning and optimization techniques for electrical drives.pptxMachine learning and optimization techniques for electrical drives.pptx
Machine learning and optimization techniques for electrical drives.pptx
 
一比一原版(Adelaide毕业证书)阿德莱德大学毕业证如何办理
一比一原版(Adelaide毕业证书)阿德莱德大学毕业证如何办理一比一原版(Adelaide毕业证书)阿德莱德大学毕业证如何办理
一比一原版(Adelaide毕业证书)阿德莱德大学毕业证如何办理
 
一比一原版(爱大毕业证书)爱丁堡大学毕业证如何办理
一比一原版(爱大毕业证书)爱丁堡大学毕业证如何办理一比一原版(爱大毕业证书)爱丁堡大学毕业证如何办理
一比一原版(爱大毕业证书)爱丁堡大学毕业证如何办理
 
原版制作(Deakin毕业证书)迪肯大学毕业证学位证一模一样
原版制作(Deakin毕业证书)迪肯大学毕业证学位证一模一样原版制作(Deakin毕业证书)迪肯大学毕业证学位证一模一样
原版制作(Deakin毕业证书)迪肯大学毕业证学位证一模一样
 
一比一原版(NYU毕业证)纽约大学毕业证成绩单
一比一原版(NYU毕业证)纽约大学毕业证成绩单一比一原版(NYU毕业证)纽约大学毕业证成绩单
一比一原版(NYU毕业证)纽约大学毕业证成绩单
 
Malana- Gimlet Market Analysis (Portfolio 2)
Malana- Gimlet Market Analysis (Portfolio 2)Malana- Gimlet Market Analysis (Portfolio 2)
Malana- Gimlet Market Analysis (Portfolio 2)
 
一比一原版(Deakin毕业证书)迪肯大学毕业证如何办理
一比一原版(Deakin毕业证书)迪肯大学毕业证如何办理一比一原版(Deakin毕业证书)迪肯大学毕业证如何办理
一比一原版(Deakin毕业证书)迪肯大学毕业证如何办理
 
一比一原版(Bradford毕业证书)布拉德福德大学毕业证如何办理
一比一原版(Bradford毕业证书)布拉德福德大学毕业证如何办理一比一原版(Bradford毕业证书)布拉德福德大学毕业证如何办理
一比一原版(Bradford毕业证书)布拉德福德大学毕业证如何办理
 
Criminal IP - Threat Hunting Webinar.pdf
Criminal IP - Threat Hunting Webinar.pdfCriminal IP - Threat Hunting Webinar.pdf
Criminal IP - Threat Hunting Webinar.pdf
 
The Building Blocks of QuestDB, a Time Series Database
The Building Blocks of QuestDB, a Time Series DatabaseThe Building Blocks of QuestDB, a Time Series Database
The Building Blocks of QuestDB, a Time Series Database
 
Influence of Marketing Strategy and Market Competition on Business Plan
Influence of Marketing Strategy and Market Competition on Business PlanInfluence of Marketing Strategy and Market Competition on Business Plan
Influence of Marketing Strategy and Market Competition on Business Plan
 
Adjusting OpenMP PageRank : SHORT REPORT / NOTES
Adjusting OpenMP PageRank : SHORT REPORT / NOTESAdjusting OpenMP PageRank : SHORT REPORT / NOTES
Adjusting OpenMP PageRank : SHORT REPORT / NOTES
 
一比一原版(UPenn毕业证)宾夕法尼亚大学毕业证成绩单
一比一原版(UPenn毕业证)宾夕法尼亚大学毕业证成绩单一比一原版(UPenn毕业证)宾夕法尼亚大学毕业证成绩单
一比一原版(UPenn毕业证)宾夕法尼亚大学毕业证成绩单
 
Q1’2024 Update: MYCI’s Leap Year Rebound
Q1’2024 Update: MYCI’s Leap Year ReboundQ1’2024 Update: MYCI’s Leap Year Rebound
Q1’2024 Update: MYCI’s Leap Year Rebound
 
Adjusting primitives for graph : SHORT REPORT / NOTES
Adjusting primitives for graph : SHORT REPORT / NOTESAdjusting primitives for graph : SHORT REPORT / NOTES
Adjusting primitives for graph : SHORT REPORT / NOTES
 
Algorithmic optimizations for Dynamic Levelwise PageRank (from STICD) : SHORT...
Algorithmic optimizations for Dynamic Levelwise PageRank (from STICD) : SHORT...Algorithmic optimizations for Dynamic Levelwise PageRank (from STICD) : SHORT...
Algorithmic optimizations for Dynamic Levelwise PageRank (from STICD) : SHORT...
 
Best best suvichar in gujarati english meaning of this sentence as Silk road ...
Best best suvichar in gujarati english meaning of this sentence as Silk road ...Best best suvichar in gujarati english meaning of this sentence as Silk road ...
Best best suvichar in gujarati english meaning of this sentence as Silk road ...
 
原版制作(swinburne毕业证书)斯威本科技大学毕业证毕业完成信一模一样
原版制作(swinburne毕业证书)斯威本科技大学毕业证毕业完成信一模一样原版制作(swinburne毕业证书)斯威本科技大学毕业证毕业完成信一模一样
原版制作(swinburne毕业证书)斯威本科技大学毕业证毕业完成信一模一样
 
Data Centers - Striving Within A Narrow Range - Research Report - MCG - May 2...
Data Centers - Striving Within A Narrow Range - Research Report - MCG - May 2...Data Centers - Striving Within A Narrow Range - Research Report - MCG - May 2...
Data Centers - Striving Within A Narrow Range - Research Report - MCG - May 2...
 
The affect of service quality and online reviews on customer loyalty in the E...
The affect of service quality and online reviews on customer loyalty in the E...The affect of service quality and online reviews on customer loyalty in the E...
The affect of service quality and online reviews on customer loyalty in the E...
 

Understanding Email Traffic (talk @ E-Discovery NL Symposium)

  • 1. Understanding email traffic David Graus, University of Amsterdam d.p.graus@uva.nl @dvdgrs
  • 2. 2
  • 3. 3 Recipient recommendation Ò Given a sender, an email, all possible recipients (in an enterprise); Ò Predict which recipient(s) are most likely to receive the email
  • 4. 4 Why? Ò Understanding communication in/structure of an enterprise Ò Applications in: Ò enterprise search Ò expert finding Ò community detection Ò spam classification Ò anomaly detection
  • 5. 5 How? Ò Gmail Ò Who do you frequently “co-address” Ò egonetwork Ò Related work Ò Social Network Analysis (SNA) Ò Email content Ò Us Ò SNA + Email content
  • 6. 6 Part 1: Social Network Analysis? d.p.graus@uva.nl z.ren@uva.nl derijke@uva.nl
  • 7. 7 image by Calvinius - Creative Commons Attribution-Share Alike 3.0
  • 8. 8 SNA for predicting recipients? 1. Importance of a node in the network More important people are more likely to be the recipient of an email 2. Strength of connection between two nodes Given sender of the email, the recipients who are frequently addressed are more likely to be the recipient
  • 9. 9 SNA for predicting recipients? 1. Importance of a node in the network 1. Number of received emails 2. PageRank score of node 2. Strength of connection between two nodes 1. Number of emails sent between nodes 2. Number of times two nodes are adressed together
  • 10. 10 Part 2: Email content Ò Statistical Language Models (LMs) ! Ò Assign a probability to a sequence of words; Ò Compute models for different corpora; ! Ò Used in lots of places; Ò Information Retrieval Ò Machine Translation Ò Speech Recognition
  • 11. 11 Language Models Ò Language models as communication “profiles”
  • 12. 12 Language Models Ò Language models as communication “profiles” 1. Incoming LM (how people talk to user)
  • 13. 13 Language Models Ò Language models as communication “profiles” 1. Incoming LM (how people talk to user) 2. Outgoing LM (how user talks to people)
  • 14. 14 Language Models Ò Language models as communication “profiles” 1. Incoming LM (how people talk to user) 2. Outgoing LM (how user talks to people) 3. Interpersonal LM (how node1 
 talks with node2)
  • 15. 15 Language Models Ò Language models as communication “profiles” 1. Incoming LM (how people talk to user) 2. Outgoing LM (how user talks to people) 3. Interpersonal LM (how node1 
 talks with node2)
  • 16. 16 Language Models Ò Language models as communication “profiles” 1. Incoming LM (how people talk to user) 2. Outgoing LM (how user talks to people) 3. Interpersonal LM (how node1 
 talks with node2) 4. Corpus LM (how everyone 
 talks)
  • 17. 17 Why language models? Ò Comparisons between communication profiles: Ò Find nodes with most similar communication
  • 18. 18 SNA ! ! 1. Importance of a node in the network ! 3. Strength of connection between nodes ! ! ! Email Content ! ! 1. Incoming LM 2. Outgoing LM 3. Interpersonal LM 4. Corpus-based LM
  • 19. 19 Approach: time-based t=0 1 email, 2 addresses t=1 2 emails, 2 addresses t=2 3 emails, 4 addresses t=3 4 emails, 5 addresses ! etc… ! t=n 607.011 emails, 2.068 addresses
  • 20. 20 At some time interval t Ò Given the email, sender, and network Ò Remove recipients from email Ò Rank all nodes in the network Ò By computing for each candidate (recipient) node: 1. Importance of candidate 2. Strength of connection between sender and candidate 3. Similarity between sender and candidate LMs
  • 21. 21
  • 22. 22 Findings: what works for predicting recipients? Ò Importance of node: 
 Number of received emails of node ! Ò Strength of connection: 
 Number of emails between nodes ! Ò LM Similarity: 
 Interpersonal LM is most important
  • 23. 23 Findings: SNA vs email content Ò SNA: Ò SNA signals deteriorate over time Ò SNA signals are most informative on highly active users ! Ò Email content: Ò LM signal improves over time Ò LM signal does worse with highly active users
  • 24. 24 Finally Ò Combining Social Network Analysis with Language Modeling is better than doing either.
  • 25. 25 Why for E-Discovery Ò Anomaly detection Ò Given a working prediction model; identify “unexpected” communication Ò Language models for communication Ò For a node, find the most different interpersonal communication Ò Friends/family vs colleagues? Ò Find communication that differs from the corpus-based communication