The Million Domain Challenge: Broadcast Email Prioritization by Cross-domain Recommendation

1 KYOTO UNIVERSITY
KYOTO UNIVERSITY
The Million Domain Challenge: Broadcast Email
Prioritization by Cross-domain Recommendation
Daiki Tanaka
Kashima lab., Kyoto University
Research Seminar, 2017/6/12(Mon)

2 KYOTO UNIVERSITY
Today’s paper:
n Title : The Million Domain Challenge: Broadcast Email
Prioritization by Cross-domain Recommendation
n Venue: KDD2016
n Authors:
Beidou Wang
Zhejiang University
Yikang Liao Martin Ester
Simon Fraser University
Yu Zhu Deng Cai
Zhejiang University
Jiajun Bu
Ziyu Guan
Northwest University of China

3 KYOTO UNIVERSITY
Overview:
n Background
n Related works
n Problem definition
n Proposed method - CBEP
n Experiment
n Conclusion

4 KYOTO UNIVERSITY
Overview:
n Background

5 KYOTO UNIVERSITY
Background:
• E-mail overload is causing serious troubles.
• A person has to waste 1 hour per day to handle unimportant
emails
n Various literature work on personalized email
prioritization.(e.g. google)
l predict importance labels for emails.
n However, broadcast email has been overlooked in the
previous personalized email prioritization literature.

6 KYOTO UNIVERSITY
Background :
Challenges of Broadcast Email
n Same sender problem
l A receiver may get many different emails with various
importance level from the same sender
n The limited types of users feedback
l We usually don’t reply to a broadcast email.

7 KYOTO UNIVERSITY
Back ground : Key idea
Collaborative filtering problem
n Each broadcast email is sent to all users of a mailing list.
l So other users' feedback (view or not) can be very helpful
in predicting the priority for a target user.
n For a user, if other users with similar interest have viewed
it, he should likely also view it.

8 KYOTO UNIVERSITY
Background : Key idea
Cross Domain Recommendation
n Cross domain recommendation transfer knowledge from
source domains to the target domain
n In our research, we treat each mailing list as a domain.
n There are millions of domains in an email-system.
Knowledge transfer
Target domain Source domain

9 KYOTO UNIVERSITY
Overview:
n Related works

10 KYOTO UNIVERSITY
Related work:
n Prioritization for Emails
l Using Linear logistic regression model
l Using social networks to capture user groups
l Using SVM
n Cross Domain Recommendation
l Previous cross domain recommendation works focused on a
relatively small set of domains. (2 or 3 domains)
l Selection of source domains is done manually
Cannot be applied to
broadcast email

11 KYOTO UNIVERSITY
Overview:
n Problem definition

12 KYOTO UNIVERSITY
Problem Definition:
variables
n User set: 𝑼
n Email set : 𝑬
n Email importance matrix : 𝑰
𝐼%,' = )
1 if user u has viewed email e.
0 if user u has;t viewed email e.
n Mailing list : 𝑀> ⊂ 𝑼, 𝐌 = {𝑴 𝟏, … , 𝑴 𝒏}
n Email set sent to 𝑴𝒊 : 𝑬𝒊 ⊂ 𝑬
n New email 𝑒I'J will be sent to a mailing list 𝑴 𝒕 (target mailing list)

13 KYOTO UNIVERSITY
Problem Definition :
Goal
n We want to predict whether a broadcast email is important
or not for a given user
l Input : user set and email set
l Output : prediction of a label of email (important or not)

14 KYOTO UNIVERSITY
Problem Definition:
dividing into 3 sub problems
n The broadcast email prioritization problem can be divided
into the following three sub problems.
1. Sample the feedback from a small portion of users, since each
broadcast email waiting for prioritization is completely cold with no
user interaction.
2. Find the optimal set of source mailing lists whose extra information
can help with priority prediction.
3. Predict the priority of the broadcast email with the help of the
feedback from the sampled users and extra information from the
source mailing lists.

15 KYOTO UNIVERSITY
Overview:
n Proposed Method - CBEP framework

16 KYOTO UNIVERSITY
Proposed method :
CBEP framework
n We introduce CBEP to solve three sub problems of
broadcast email prioritization:
1. user feedback sampling
2. optimal source domain set selection (major contribution
of this paper)
3. priority prediction

17 KYOTO UNIVERSITY
CBEP framework (1/3) :
1.user feedback sampling
n we send a new mail to all the users without priority labels
and we wait for a short period of time
n Sampled user set : 𝑺 ∈ 𝑀N
n then collect feedbacks from users
l Positive feedback : the email is viewed
l Negative feedback : the email isn’t viewed

18 KYOTO UNIVERSITY
2.Optimal Source Domain Set Selection
n Given the target mailing list 𝑀N, we defined a binary vector
𝜶 = (𝛼R, … , 𝛼I)T
as follows:
𝛼𝒊 = )
1 𝑖𝑓 𝑡ℎ𝑒 𝑠𝑜𝑢𝑟𝑐𝑒 𝑚𝑎𝑖𝑙𝑖𝑛𝑔 𝑙𝑖𝑠𝑡 𝑴𝒊 𝑖𝑠 𝑠𝑒𝑙𝑒𝑐𝑡𝑒𝑑
0 𝑒𝑙𝑠𝑒

n Our goal is to get 𝜶 that maximizes the objective function.

19 KYOTO UNIVERSITY
n we consider three factors to select the optimal source
domains :
l overlap of users
l feedback pattern similarity
l coverage of users

20 KYOTO UNIVERSITY
CBEP framework (2/3) : 2.Optimal Source Domain Set Selection
overlap of users
n For a source mailing list and a target mailing list, we define
overlap of users as :
n 𝑴> ∶ 𝑠𝑜𝑢𝑟𝑐𝑒 𝑚𝑎𝑖𝑙𝑖𝑛𝑔 𝑙𝑖𝑠𝑡
n 𝑴N ∶ 𝑡𝑎𝑟𝑔𝑒𝑡 𝑚𝑎𝑖𝑙𝑖𝑛𝑔 𝑙𝑖𝑠𝑡

21 KYOTO UNIVERSITY
Similar feedback pattern
n Next, We defined the similarity of the feedback patterns between
two mailing lists 𝑀N 𝑎𝑛𝑑 𝑀> as follows :
𝑠𝑖𝑚>(𝑡) = 1 −
1
2 𝑪N,>
h i cos 𝒗N,%, 𝒗N,J − cos 𝒗𝒊,𝒖, 𝒗>,J
%,J∈𝑪o,p
l 𝑪N,> : the shared user set between two mailing lists 𝑀N 𝑎𝑛𝑑 𝑀>.
l 𝒗>,% ; binary vector with each entry indicating whether user u has
read mails in 𝑬𝒊 (which are sent to mailing list 𝑀>).

22 KYOTO UNIVERSITY
Coverage of Users
n We want the number of shared users between
𝑀;(𝑠𝑜𝑢𝑟𝑐𝑒 𝑚𝑎𝑖𝑙𝑖𝑛𝑔 𝑙𝑖𝑠𝑡 𝑠𝑒𝑡)𝑎𝑛𝑑 𝑀N(target mailing list) to be as
large as possible.
n That’s to say we want to choose a size-k mailing list set M’.
max u 𝑪>,N
vp⊆v;
n This problem is NP-hard. (Maximum coverage problem)
n Instead of this, we define overlap percentage between source mailing lists
𝑀>, 𝑀x 𝑎𝑛𝑑 𝑡ℎ𝑒 𝑡𝑎𝑟𝑔𝑒𝑡 𝑚𝑎𝑖𝑙𝑖𝑛𝑔 𝑙𝑖𝑠𝑡 𝑀N as follows :
𝑜𝑣𝑒𝑟𝑙𝑎𝑝>,x 𝑡 =
𝑴> ∩ 𝑴x ∩ 𝑴N
𝑴N

23 KYOTO UNIVERSITY
CBEP framework (2/3) :2.Optimal Source Domain Set Selection
Objective function
n Objective function :
n This is a difficult problem.(with both quadratic term and fraction)
n So we proposed two approximate solutions.
This is a normalizer preventing
the function from selecting too
many source mailing lists

24 KYOTO UNIVERSITY
n Approximate solution 1. (this is used in CBEP-A1 in experiments)
n Relax the constraint (to make it a quadratic linear programming)
n Setting a threshold 𝛾
l Source domains with 𝛼> ≥ 𝛾 are selected

25 KYOTO UNIVERSITY
n Approximate solution 2. (this is used in CBEP-A2 in experiments)
n We solve this for 𝑧€•‚ times for 𝑧ƒ ∈ {1,2, … , 𝑧€•‚}
n 𝑧€•‚ ∶ upper bound of the number of source domains

26 KYOTO UNIVERSITY
3.Priority Prediction
n Feedback set 𝐼;
= {𝐼vo,„o
, 𝐼v…,„†…
, 𝐼‡,'ˆ‰Š
}
l Matrix 𝑰v,„ is the feedback from user set 𝑀 on email set 𝐸.
n We use a weighted low-rank approximation method. (Matrix
factorization)
n 𝑰 ≃ 𝑷𝑸 𝑻
users
items
Rating

27 KYOTO UNIVERSITY
3.Priority Prediction –Matrix factorization problem
n Our objective is to minimize the following loss function.
ℒ 𝑷, 𝑸 = i 𝑊>x 𝑰;
>x − 𝑷>. 𝑸x.
”
+ 𝜆( 𝑷 —
h
+ 𝑸 —
h
)
>,x
n 𝑷 𝑎𝑛𝑑 𝑸 stand for the latent vectors for users{𝑴;, 𝑴N} and
items{𝑬v;, 𝑬N, 𝑒I'J}
n Alternating Least Squares(ALS) is used to solve this.

28 KYOTO UNIVERSITY
𝑚𝑖𝑛ℒ 𝑷, 𝑸 = i 𝑊>x 𝑰;
>x − 𝑷>. 𝑸x.
”
+ 𝜆( 𝑷 —
h
+ 𝑸 —
h
)
>,x
l Fixing Q, and solving
˜ℒ 𝑷,𝑸
˜𝑷p.
𝑷>. = 𝑰;
> 𝑾>.
š 𝑸(𝑸 𝑻 𝑾𝒊.
š 𝑸 + 𝜆 i 𝑊>x 𝑰𝑫
x
) œR
l Fixing P, and solving
˜ℒ 𝑷,𝑸
˜𝑸•.
𝑸𝒋. = 𝑰′.x
”
𝑾.x
š 𝑷(𝑷 𝑻 𝑾.𝒋
š 𝑷 + 𝜆(i 𝑊>x 𝑰𝑫
x
)) œR
n For each remaining user 𝑢> ∈ (𝑴N−𝑺) , the priority to 𝑒I'J is predicted as :
𝐼>,'ˆ‰Š
= 𝑷𝒊 𝑸 𝒆 𝒏𝒆𝒘
𝑻

29 KYOTO UNIVERSITY
n We define the percentage of users considering email 𝑒I'J important as :
𝐻 𝑒I'J =
𝑝𝑜𝑠(𝑒I'J)
𝑝𝑜𝑠•£¤(𝑴N)
𝑡𝑟(𝑰 𝑴 𝒕
𝑻
𝑰 𝑴 𝒕
)
𝑴 𝒕 ∗ 𝑬 𝒕
• 𝑝𝑜𝑠(𝑒I'J) : total number of viewed-email behaviors observed in
the waiting time for 𝑒I'J.
• 𝑝𝑜𝑠•£¤(𝑴N) : average number of viewed-email behaviors observed
in the waiting time for all the emails from 𝑴N.
n ““For the top H(𝑒I'J) percent of users according to 𝑦>,'ˆ‰Š
,we predict
𝑒I'J as important while for others as unimportant.””

30 KYOTO UNIVERSITY
Overview:
n Experiments

31 KYOTO UNIVERSITY
Experiments :
dataset
n emails and their view logs from a large business mailing list
within Samsung.
l 6506 broadcasting emails
l 333,979 view records.
l 490 mailing-lists
• training set : 5475 emails and their view records
• testing set : 1031 emails and their view records

32 KYOTO UNIVERSITY
Experiment1 :
Evaluation Metrics
l In the experiment, we evaluate
the precision, recall and f-score at
two levels.
l Mail level
• Average of all the emails in the test set
l Mailing list level
• Average of all mailing lists in the test set

33 KYOTO UNIVERSITY
Experimen1 :
Baselines
n Single Mailing List (SML)
• Only considering the information from the target mailing list.
n All Mailing Lists (AML)
• Considering all the source mailing lists.
n Overlapping Mailing Lists(OML)
• Select top-k source mailing lists with largest overlap with the
target domain.
n Feedback Similar Mailing Lists (FSML)
• Select top-k source mailing lists with highest feedback similarity
with the target domain
n CBEP Without Weight (CBEP-SVD) – using SVD in prediction

34 KYOTO UNIVERSITY
Experiment1 : Results
n Proposed method(CBEP-A1 and CBEP-A2) outperform all
the baselines on all the evaluation metrics.

35 KYOTO UNIVERSITY
Experiment2 :
n We consider three factors to select the optimal source
domains.
n In this experiment, we remove these three factors one at a
time.
n In this way, we evaluate how much these factors affect the
prediction precision.
n Mailing list level results for CBEP-A1

36 KYOTO UNIVERSITY
Experiment2 : Result
The coverage of users criterion is the most important
precision

37 KYOTO UNIVERSITY
Overview:
n Conclusion

38 KYOTO UNIVERSITY
Conclusion
l We introduce the problem of personalized broadcast
email prioritization considering large number of mailing
lists.
l We propose a novel cross domain recommendation
framework CBEP.
l We show that our method CBEP outperforms all the
baselines.

The Million Domain Challenge: Broadcast Email Prioritization by Cross-domain Recommendation

Recommended

Recommended

More Related Content

Similar to The Million Domain Challenge: Broadcast Email Prioritization by Cross-domain Recommendation

Similar to The Million Domain Challenge: Broadcast Email Prioritization by Cross-domain Recommendation (20)

More from Daiki Tanaka

More from Daiki Tanaka (12)

Recently uploaded

Recently uploaded (20)

The Million Domain Challenge: Broadcast Email Prioritization by Cross-domain Recommendation