SlideShare a Scribd company logo
1 KYOTO UNIVERSITY
KYOTO UNIVERSITY
The	Million	Domain	Challenge:	Broadcast	Email	
Prioritization	by	Cross-domain	Recommendation	
Daiki Tanaka
Kashima	lab.,	Kyoto	University
Research	Seminar,	2017/6/12(Mon)
2 KYOTO UNIVERSITY
Today’s paper:
n Title	:	The	Million	Domain	Challenge:	Broadcast	Email	
Prioritization	by	Cross-domain	Recommendation
n Venue:	KDD2016
n Authors:
Beidou Wang
Zhejiang	University
Yikang Liao Martin	Ester
Simon	Fraser	University
Yu	Zhu Deng	Cai
Zhejiang	University
Jiajun Bu
Ziyu Guan
Northwest	University	of	China
3 KYOTO UNIVERSITY
Overview:
n Background
n Related	works
n Problem	definition
n Proposed	method	- CBEP
n Experiment
n Conclusion
4 KYOTO UNIVERSITY
Overview:
n Background
5 KYOTO UNIVERSITY
Background:
• E-mail	overload	is	causing	serious	troubles.
• A	person	has	to	waste	1	hour	per	day		to	handle	unimportant	
emails
n Various	literature	work	on	personalized	email	
prioritization.(e.g.	google)	
l predict	importance	labels	for	emails.
n However,	broadcast	email	has	been	overlooked	in	the	
previous	personalized	email	prioritization	literature.
6 KYOTO UNIVERSITY
Background	:
Challenges	of	Broadcast	Email
n Same	sender	problem
l A	receiver	may	get	many	different	emails	with	various	
importance	level	from	the	same	sender
n The	limited	types	of	users	feedback
l We	usually	don’t	reply	to	a	broadcast	email.
7 KYOTO UNIVERSITY
Back	ground	:	Key	idea
Collaborative	filtering	problem
n Each	broadcast	email	is	sent	to	all	users	of	a	mailing	list.
l So	other	users'	feedback	(view	or	not)	can	be	very	helpful	
in	predicting	the	priority	for	a	target	user.
n For	a	user,	if	other	users	with	similar	interest	have	viewed	
it,	he	should	likely	also	view	it.
8 KYOTO UNIVERSITY
Background	:	Key	idea
Cross	Domain	Recommendation
n Cross	domain	recommendation	transfer	knowledge	from	
source	domains	to	the	target	domain
n In	our	research,	we	treat	each	mailing	list	as	a	domain.
n There	are	millions	of	domains	in	an	email-system.
Knowledge	transfer
Target	domain Source	domain
9 KYOTO UNIVERSITY
Overview:
n Related	works
10 KYOTO UNIVERSITY
Related	work:
n Prioritization	for	Emails
l Using	Linear	logistic	regression	model
l Using	social	networks	to	capture	user	groups
l Using	SVM
n Cross	Domain	Recommendation
l Previous	cross	domain	recommendation	works	focused	on	a	
relatively	small	set	of	domains.	(2	or	3	domains)
l Selection	of	source	domains	is	done	manually
Cannot	be	applied	to	
broadcast	email
11 KYOTO UNIVERSITY
Overview:
n Problem	definition
12 KYOTO UNIVERSITY
Problem	Definition:
variables	
n User	set:	𝑼
n Email	set	:	𝑬
n Email	importance	matrix	:	𝑰
𝐼%,' = )
1						if	user	u	has	viewed	email	e.
0				if	user	u	has;t	viewed	email	e.
n Mailing	list	:	𝑀> ⊂ 𝑼, 			𝐌 = {𝑴 𝟏, … , 𝑴 𝒏}
n Email	set	sent	to	𝑴𝒊	:	𝑬𝒊 ⊂ 𝑬
n New	email	𝑒I'J will	be	sent	to	a	mailing	list	𝑴 𝒕	(target	mailing	list)
13 KYOTO UNIVERSITY
Problem	Definition	:
Goal
n We	want	to	predict	whether	a	broadcast	email	is	important	
or	not	for	a	given	user
l Input	:	user	set	and	email	set
l Output	:	prediction	of	a	label	of	email	(important	or	not)
14 KYOTO UNIVERSITY
Problem	Definition:
dividing	into	3	sub	problems
n The	broadcast	email	prioritization	problem	can	be	divided	
into	the	following	three	sub	problems.
1. Sample	the	feedback	from	a	small	portion	of	users,	since	each	
broadcast	email	waiting	for	prioritization	is	completely	cold	with	no	
user	interaction.
2. Find	the	optimal	set	of	source	mailing	lists	whose	extra	information	
can	help	with	priority	prediction.
3. Predict	the	priority	of	the	broadcast	email	with	the	help	of	the	
feedback	from	the	sampled	users	and	extra	information	from	the	
source	mailing	lists.
15 KYOTO UNIVERSITY
Overview:
n Proposed	Method	- CBEP	framework
16 KYOTO UNIVERSITY
Proposed	method	:
CBEP	framework	
n We	introduce	CBEP	to	solve	three	sub	problems	of	
broadcast	email	prioritization:
1. user	feedback	sampling
2. optimal	source	domain	set	selection	(major	contribution	
of	this	paper)
3. priority	prediction
17 KYOTO UNIVERSITY
CBEP	framework	(1/3)	:
1.user	feedback	sampling
n we	send	a	new	mail	to	all	the	users	without	priority	labels	
and	we	wait	for	a	short	period	of	time
n Sampled	user	set	:	𝑺 ∈ 𝑀N
n then	collect	feedbacks	from	users
l Positive	feedback	:	the	email	is	viewed
l Negative	feedback	:	the	email	isn’t	viewed
18 KYOTO UNIVERSITY
CBEP	framework	(2/3)	:	
2.Optimal	Source	Domain	Set	Selection
n Given	the	target	mailing	list	𝑀N,	we	defined	a	binary	vector	
𝜶 = (𝛼R, … , 𝛼I)T
	as	follows:
𝛼𝒊 = )
1		𝑖𝑓	𝑡ℎ𝑒	𝑠𝑜𝑢𝑟𝑐𝑒	𝑚𝑎𝑖𝑙𝑖𝑛𝑔	𝑙𝑖𝑠𝑡	𝑴𝒊	𝑖𝑠	𝑠𝑒𝑙𝑒𝑐𝑡𝑒𝑑
0																													𝑒𝑙𝑠𝑒																																													
	
n Our	goal	is	to	get	𝜶 that	maximizes	the	objective	function.
19 KYOTO UNIVERSITY
CBEP	framework	(2/3)	:
2.Optimal	Source	Domain	Set	Selection
n we	consider	three	factors	to	select	the	optimal	source	
domains	:
l overlap	of	users
l feedback	pattern	similarity	
l coverage	of	users
20 KYOTO UNIVERSITY
CBEP	framework	(2/3)	: 2.Optimal	Source	Domain	Set	Selection
overlap	of	users
n For	a	source	mailing	list	and	a	target	mailing	list,	we	define	
overlap	of	users	as	:
n 𝑴> ∶ 𝑠𝑜𝑢𝑟𝑐𝑒	𝑚𝑎𝑖𝑙𝑖𝑛𝑔	𝑙𝑖𝑠𝑡
n 𝑴N ∶ 𝑡𝑎𝑟𝑔𝑒𝑡	𝑚𝑎𝑖𝑙𝑖𝑛𝑔	𝑙𝑖𝑠𝑡
21 KYOTO UNIVERSITY
CBEP	framework	(2/3)	: 2.Optimal	Source	Domain	Set	Selection
Similar	feedback	pattern
n Next,	We	defined	the	similarity	of	the	feedback	patterns	between	
two	mailing	lists	𝑀N	𝑎𝑛𝑑	𝑀>	as	follows	:
𝑠𝑖𝑚>(𝑡) = 1 −
1
2 𝑪N,>
h i cos 𝒗N,%, 𝒗N,J − cos 𝒗𝒊,𝒖, 𝒗>,J
%,J∈𝑪o,p
l 𝑪N,> :	the	shared	user	set	between	two	mailing	lists	𝑀N 𝑎𝑛𝑑	𝑀>.
l 𝒗>,%	;	binary	vector	with	each	entry	indicating	whether	user	u has	
read	mails	in	𝑬𝒊 (which	are	sent	to	mailing	list	𝑀>).
22 KYOTO UNIVERSITY
CBEP	framework	(2/3)	:	2.Optimal	Source	Domain	Set	Selection
Coverage	of	Users
n We	want	the	number	of	shared	users	between	
𝑀;(𝑠𝑜𝑢𝑟𝑐𝑒	𝑚𝑎𝑖𝑙𝑖𝑛𝑔	𝑙𝑖𝑠𝑡	𝑠𝑒𝑡)𝑎𝑛𝑑	𝑀N(target	mailing	list)	to	be	as	
large	as	possible.
n That’s	to	say	we	want	to	choose	a	size-k	mailing	list	set	M’.
max u 𝑪>,N
vp⊆v;
n This	problem	is	NP-hard.	(Maximum	coverage	problem)
n Instead	of	this,	we	define	overlap	percentage	between	source	mailing	lists	
𝑀>, 𝑀x	𝑎𝑛𝑑	𝑡ℎ𝑒	𝑡𝑎𝑟𝑔𝑒𝑡	𝑚𝑎𝑖𝑙𝑖𝑛𝑔	𝑙𝑖𝑠𝑡	𝑀N as	follows	:
𝑜𝑣𝑒𝑟𝑙𝑎𝑝>,x 𝑡 =
𝑴> ∩ 𝑴x ∩ 𝑴N
𝑴N
23 KYOTO UNIVERSITY
CBEP	framework	(2/3)	:2.Optimal	Source	Domain	Set	Selection
Objective	function
n Objective	function	:
n This	is	a	difficult	problem.(with	both	quadratic	term	and	fraction)
n So	we	proposed	two	approximate	solutions.
This	is a	normalizer	preventing	
the	function	from	selecting	too	
many	source	mailing	lists
24 KYOTO UNIVERSITY
CBEP	framework	(2/3)	:
2.Optimal	Source	Domain	Set	Selection
n Approximate	solution	1.		(this	is	used	in	CBEP-A1	in	experiments)
n Relax	the	constraint	(to	make	it	a	quadratic	linear	programming)
n Setting	a	threshold	𝛾
l Source	domains	with	𝛼> ≥ 𝛾 are	selected
25 KYOTO UNIVERSITY
CBEP	framework	(2/3)	:
2.Optimal	Source	Domain	Set	Selection
n Approximate	solution	2.	 (this	is	used	in	CBEP-A2	in	experiments)
n We	solve	this	for	𝑧€•‚ times	for	𝑧ƒ ∈ {1,2, … , 𝑧€•‚}
n 𝑧€•‚ ∶	upper	bound	of	the	number	of	source	domains
26 KYOTO UNIVERSITY
CBEP	framework	(3/3)	:
3.Priority	Prediction
n Feedback	set	𝐼;
= {𝐼vo,„o
, 𝐼v…,„†…
, 𝐼‡,'ˆ‰Š
}
l Matrix	 𝑰v,„	is	the	feedback	from	user	set	𝑀	on	email	set	𝐸.
n We	use	a	weighted	low-rank	approximation	method.	(Matrix	
factorization)
n 𝑰 ≃ 𝑷𝑸 𝑻
users
items
Rating
27 KYOTO UNIVERSITY
CBEP	framework	(3/3)	:
3.Priority	Prediction	–Matrix	factorization	problem
n Our	objective	is	to	minimize	the	following	loss	function.
ℒ 𝑷, 𝑸 = i 𝑊>x 𝑰;
>x − 𝑷>. 𝑸x.
”
+ 𝜆( 𝑷 —
h
+ 𝑸 —
h
)
>,x
n 𝑷	𝑎𝑛𝑑	𝑸	stand	for	the	latent	vectors	for	users{𝑴;, 𝑴N}	and	
items{𝑬v;, 𝑬N, 𝑒I'J}
n Alternating	Least	Squares(ALS)	is	used	to	solve	this.
28 KYOTO UNIVERSITY
CBEP	framework	(3/3)	:
3.Priority	Prediction
𝑚𝑖𝑛ℒ 𝑷, 𝑸 = i 𝑊>x 𝑰;
>x − 𝑷>. 𝑸x.
”
+ 𝜆( 𝑷 —
h
+ 𝑸 —
h
)
>,x
l Fixing	Q,	and	solving	
˜ℒ 𝑷,𝑸
˜𝑷p.
𝑷>. = 𝑰;
> 𝑾>.
š 𝑸(𝑸 𝑻 𝑾𝒊.
š 𝑸 + 𝜆 i 𝑊>x 𝑰𝑫
x
)	œR
l Fixing	P,	and	solving	
˜ℒ 𝑷,𝑸
˜𝑸•.
𝑸𝒋. = 𝑰′.x
”
𝑾.x
š 𝑷(𝑷 𝑻 𝑾.𝒋
š 𝑷 + 𝜆(i 𝑊>x 𝑰𝑫
x
))	œR
n For	each	remaining	user		𝑢> ∈ (𝑴N−𝑺) ,	the	priority	to	𝑒I'J is	predicted	as	:
𝐼>,'ˆ‰Š
= 𝑷𝒊 𝑸 𝒆 𝒏𝒆𝒘
𝑻
29 KYOTO UNIVERSITY
CBEP	framework	(3/3)	:
3.Priority	Prediction
n We	define	the	percentage	of	users	considering	email	𝑒I'J important	as	:
𝐻 𝑒I'J =
𝑝𝑜𝑠(𝑒I'J)
𝑝𝑜𝑠•£¤(𝑴N)
𝑡𝑟(𝑰 𝑴 𝒕
𝑻
𝑰 𝑴 𝒕
)
𝑴 𝒕 ∗ 𝑬 𝒕
• 𝑝𝑜𝑠(𝑒I'J) :	total	number	of	viewed-email	behaviors	observed	in	
the	waiting	time	for	𝑒I'J.
• 𝑝𝑜𝑠•£¤(𝑴N) :	average	number	of	viewed-email	behaviors	observed	
in	the	waiting	time	for	all	the	emails	from	𝑴N.
n ““For	the	top	H(𝑒I'J)	percent	of	users	according	to	𝑦>,'ˆ‰Š
,we	predict	
𝑒I'J as	important	while	for	others	as	unimportant.””
30 KYOTO UNIVERSITY
Overview:
n Experiments
31 KYOTO UNIVERSITY
Experiments	:
dataset
n emails	and	their	view	logs	from	a	large	business	mailing	list	
within	Samsung.
l 6506	broadcasting	emails
l 333,979	view	records.
l 490	mailing-lists
• training	set	:	5475	emails	and	their	view	records
• testing	set	:	1031	emails	and	their	view	records
32 KYOTO UNIVERSITY
Experiment1	:
Evaluation	Metrics
l In	the	experiment,	we	evaluate
the	precision,	recall	and	f-score	at
two	levels.
l Mail	level
• Average	of	all	the	emails	in	the	test	set
l Mailing	list	level
• Average	of	all	mailing	lists	in	the	test	set
33 KYOTO UNIVERSITY
Experimen1	:
Baselines
n Single	Mailing	List	(SML)	
• Only	considering	the	information	from	the	target	mailing	list.
n All	Mailing	Lists	(AML)
• Considering	all	the	source	mailing	lists.
n Overlapping	Mailing	Lists(OML)
• Select	top-k	source	mailing	lists	with	largest	overlap	with	the	
target	domain.
n Feedback	Similar	Mailing	Lists	(FSML)
• Select	top-k	source	mailing	lists	with	highest	feedback	similarity	
with	the	target	domain
n CBEP	Without	Weight	(CBEP-SVD)	– using	SVD	in	prediction
34 KYOTO UNIVERSITY
Experiment1	:	Results
n Proposed	method(CBEP-A1 and	CBEP-A2) outperform	all	
the	baselines	on	all	the	evaluation	metrics.
35 KYOTO UNIVERSITY
Experiment2	:	
n We	consider	three	factors	to	select	the	optimal	source	
domains.
n In	this	experiment,	we	remove	these	three	factors	one	at	a	
time.
n In	this	way,	we	evaluate	how	much	these	factors	affect	the	
prediction	precision.
n Mailing	list	level	results	for	CBEP-A1
36 KYOTO UNIVERSITY
Experiment2	:	Result
The	coverage	of	users	criterion	is	the	most	important
precision
37 KYOTO UNIVERSITY
Overview:
n Conclusion
38 KYOTO UNIVERSITY
Conclusion
l We	introduce	the	problem	of	personalized	broadcast	
email	prioritization	considering	large	number	of	mailing	
lists.
l We	propose	a	novel	cross	domain	recommendation	
framework	CBEP.
l We	show	that	our	method	CBEP	outperforms	all	the	
baselines.

More Related Content

Similar to The Million Domain Challenge: Broadcast Email Prioritization by Cross-domain Recommendation

Case Study_The Diophantine Equation.pdf
Case Study_The Diophantine Equation.pdfCase Study_The Diophantine Equation.pdf
Case Study_The Diophantine Equation.pdf
akram407615
 
On clusteredsteinertree slide-ver 1.1
On clusteredsteinertree slide-ver 1.1On clusteredsteinertree slide-ver 1.1
On clusteredsteinertree slide-ver 1.1
VitAnhNguyn94
 
Hanjun Dai, PhD Student, School of Computational Science and Engineering, Geo...
Hanjun Dai, PhD Student, School of Computational Science and Engineering, Geo...Hanjun Dai, PhD Student, School of Computational Science and Engineering, Geo...
Hanjun Dai, PhD Student, School of Computational Science and Engineering, Geo...
MLconf
 
Ecrice - ChemEd X Data - Barcelona - 2016 talk
Ecrice - ChemEd X Data - Barcelona - 2016 talkEcrice - ChemEd X Data - Barcelona - 2016 talk
Ecrice - ChemEd X Data - Barcelona - 2016 talk
University of Minnesota Rochester
 
Resume
ResumeResume
Resume
Ji Heng
 
0580 s09 qp_2
0580 s09 qp_20580 s09 qp_2
0580 s09 qp_2
King Ali
 
Jason_Helms_Defense
Jason_Helms_DefenseJason_Helms_Defense
Jason_Helms_Defense
Jason Helms
 
KNN Classifier
KNN ClassifierKNN Classifier
KNN Classifier
Mobashshirur Rahman 👲
 
Learning multifractal structure in large networks (Purdue ML Seminar)
Learning multifractal structure in large networks (Purdue ML Seminar)Learning multifractal structure in large networks (Purdue ML Seminar)
Learning multifractal structure in large networks (Purdue ML Seminar)
Austin Benson
 
Developing Computational Skills in the Sciences with Matlab Webinar 2017
Developing Computational Skills in the Sciences with Matlab Webinar 2017Developing Computational Skills in the Sciences with Matlab Webinar 2017
Developing Computational Skills in the Sciences with Matlab Webinar 2017
SERC at Carleton College
 
Aplicación de la derivada en la carrea de telecomunicaciones
Aplicación de la derivada en la carrea de telecomunicacionesAplicación de la derivada en la carrea de telecomunicaciones
Aplicación de la derivada en la carrea de telecomunicaciones
ENelson3
 
Accurate Quantum Chemistry via Machine-Learning and ...
Accurate Quantum Chemistry via Machine-Learning and ...Accurate Quantum Chemistry via Machine-Learning and ...
Accurate Quantum Chemistry via Machine-Learning and ...
butest
 
PhD Defense Presentation
PhD Defense PresentationPhD Defense Presentation
PhD Defense Presentation
Keita Kalomba Mboyi
 
MrKNN_Soft Relevance for Multi-label Classification
MrKNN_Soft Relevance for Multi-label ClassificationMrKNN_Soft Relevance for Multi-label Classification
MrKNN_Soft Relevance for Multi-label Classification
YI-JHEN LIN
 
K-Nearest Neighbor Classifier
K-Nearest Neighbor ClassifierK-Nearest Neighbor Classifier
K-Nearest Neighbor Classifier
Neha Kulkarni
 
Mit2 72s09 lec09
Mit2 72s09 lec09Mit2 72s09 lec09
Mit2 72s09 lec09
Jasim Almuhandis
 
ISM2014
ISM2014ISM2014
ISM2014
nlab_utokyo
 
Paper Study: Melding the data decision pipeline
Paper Study: Melding the data decision pipelinePaper Study: Melding the data decision pipeline
Paper Study: Melding the data decision pipeline
ChenYiHuang5
 
Multi-Objective Genetic Topological Optimization for Design of composite wall...
Multi-Objective Genetic Topological Optimization for Design of composite wall...Multi-Objective Genetic Topological Optimization for Design of composite wall...
Multi-Objective Genetic Topological Optimization for Design of composite wall...
Sardasht S. Weli
 
Project Allocation Linear Programming Optimisation
Project Allocation Linear Programming OptimisationProject Allocation Linear Programming Optimisation
Project Allocation Linear Programming Optimisation
Ristanti Ramadanti
 

Similar to The Million Domain Challenge: Broadcast Email Prioritization by Cross-domain Recommendation (20)

Case Study_The Diophantine Equation.pdf
Case Study_The Diophantine Equation.pdfCase Study_The Diophantine Equation.pdf
Case Study_The Diophantine Equation.pdf
 
On clusteredsteinertree slide-ver 1.1
On clusteredsteinertree slide-ver 1.1On clusteredsteinertree slide-ver 1.1
On clusteredsteinertree slide-ver 1.1
 
Hanjun Dai, PhD Student, School of Computational Science and Engineering, Geo...
Hanjun Dai, PhD Student, School of Computational Science and Engineering, Geo...Hanjun Dai, PhD Student, School of Computational Science and Engineering, Geo...
Hanjun Dai, PhD Student, School of Computational Science and Engineering, Geo...
 
Ecrice - ChemEd X Data - Barcelona - 2016 talk
Ecrice - ChemEd X Data - Barcelona - 2016 talkEcrice - ChemEd X Data - Barcelona - 2016 talk
Ecrice - ChemEd X Data - Barcelona - 2016 talk
 
Resume
ResumeResume
Resume
 
0580 s09 qp_2
0580 s09 qp_20580 s09 qp_2
0580 s09 qp_2
 
Jason_Helms_Defense
Jason_Helms_DefenseJason_Helms_Defense
Jason_Helms_Defense
 
KNN Classifier
KNN ClassifierKNN Classifier
KNN Classifier
 
Learning multifractal structure in large networks (Purdue ML Seminar)
Learning multifractal structure in large networks (Purdue ML Seminar)Learning multifractal structure in large networks (Purdue ML Seminar)
Learning multifractal structure in large networks (Purdue ML Seminar)
 
Developing Computational Skills in the Sciences with Matlab Webinar 2017
Developing Computational Skills in the Sciences with Matlab Webinar 2017Developing Computational Skills in the Sciences with Matlab Webinar 2017
Developing Computational Skills in the Sciences with Matlab Webinar 2017
 
Aplicación de la derivada en la carrea de telecomunicaciones
Aplicación de la derivada en la carrea de telecomunicacionesAplicación de la derivada en la carrea de telecomunicaciones
Aplicación de la derivada en la carrea de telecomunicaciones
 
Accurate Quantum Chemistry via Machine-Learning and ...
Accurate Quantum Chemistry via Machine-Learning and ...Accurate Quantum Chemistry via Machine-Learning and ...
Accurate Quantum Chemistry via Machine-Learning and ...
 
PhD Defense Presentation
PhD Defense PresentationPhD Defense Presentation
PhD Defense Presentation
 
MrKNN_Soft Relevance for Multi-label Classification
MrKNN_Soft Relevance for Multi-label ClassificationMrKNN_Soft Relevance for Multi-label Classification
MrKNN_Soft Relevance for Multi-label Classification
 
K-Nearest Neighbor Classifier
K-Nearest Neighbor ClassifierK-Nearest Neighbor Classifier
K-Nearest Neighbor Classifier
 
Mit2 72s09 lec09
Mit2 72s09 lec09Mit2 72s09 lec09
Mit2 72s09 lec09
 
ISM2014
ISM2014ISM2014
ISM2014
 
Paper Study: Melding the data decision pipeline
Paper Study: Melding the data decision pipelinePaper Study: Melding the data decision pipeline
Paper Study: Melding the data decision pipeline
 
Multi-Objective Genetic Topological Optimization for Design of composite wall...
Multi-Objective Genetic Topological Optimization for Design of composite wall...Multi-Objective Genetic Topological Optimization for Design of composite wall...
Multi-Objective Genetic Topological Optimization for Design of composite wall...
 
Project Allocation Linear Programming Optimisation
Project Allocation Linear Programming OptimisationProject Allocation Linear Programming Optimisation
Project Allocation Linear Programming Optimisation
 

More from Daiki Tanaka

[Paper Reading] Theoretical Analysis of Self-Training with Deep Networks on U...
[Paper Reading] Theoretical Analysis of Self-Training with Deep Networks on U...[Paper Reading] Theoretical Analysis of Self-Training with Deep Networks on U...
[Paper Reading] Theoretical Analysis of Self-Training with Deep Networks on U...
Daiki Tanaka
 
カーネル法:正定値カーネルの理論
カーネル法:正定値カーネルの理論カーネル法:正定値カーネルの理論
カーネル法:正定値カーネルの理論
Daiki Tanaka
 
[Paper Reading] Causal Bandits: Learning Good Interventions via Causal Inference
[Paper Reading] Causal Bandits: Learning Good Interventions via Causal Inference[Paper Reading] Causal Bandits: Learning Good Interventions via Causal Inference
[Paper Reading] Causal Bandits: Learning Good Interventions via Causal Inference
Daiki Tanaka
 
[Paper reading] L-SHAPLEY AND C-SHAPLEY: EFFICIENT MODEL INTERPRETATION FOR S...
[Paper reading] L-SHAPLEY AND C-SHAPLEY: EFFICIENT MODEL INTERPRETATION FOR S...[Paper reading] L-SHAPLEY AND C-SHAPLEY: EFFICIENT MODEL INTERPRETATION FOR S...
[Paper reading] L-SHAPLEY AND C-SHAPLEY: EFFICIENT MODEL INTERPRETATION FOR S...
Daiki Tanaka
 
Selective inference
Selective inferenceSelective inference
Selective inference
Daiki Tanaka
 
Anomaly Detection with VAEGAN and Attention [JSAI2019 report]
Anomaly Detection with VAEGAN and Attention [JSAI2019 report]Anomaly Detection with VAEGAN and Attention [JSAI2019 report]
Anomaly Detection with VAEGAN and Attention [JSAI2019 report]
Daiki Tanaka
 
オンライン学習 : Online learning
オンライン学習 : Online learningオンライン学習 : Online learning
オンライン学習 : Online learning
Daiki Tanaka
 
[Paper Reading] Attention is All You Need
[Paper Reading] Attention is All You Need[Paper Reading] Attention is All You Need
[Paper Reading] Attention is All You Need
Daiki Tanaka
 
Interpretability of machine learning
Interpretability of machine learningInterpretability of machine learning
Interpretability of machine learning
Daiki Tanaka
 
The Limits of Popularity-Based Recommendations, and the Role of Social Ties
The Limits of Popularity-Based Recommendations, and the Role of Social TiesThe Limits of Popularity-Based Recommendations, and the Role of Social Ties
The Limits of Popularity-Based Recommendations, and the Role of Social Ties
Daiki Tanaka
 
Learning Deep Representation from Big and Heterogeneous Data for Traffic Acci...
Learning Deep Representation from Big and Heterogeneous Data for Traffic Acci...Learning Deep Representation from Big and Heterogeneous Data for Traffic Acci...
Learning Deep Representation from Big and Heterogeneous Data for Traffic Acci...
Daiki Tanaka
 
Toeplitz Inverse Covariance-Based Clustering of Multivariate Time Series Data
Toeplitz Inverse Covariance-Based Clustering of Multivariate Time Series DataToeplitz Inverse Covariance-Based Clustering of Multivariate Time Series Data
Toeplitz Inverse Covariance-Based Clustering of Multivariate Time Series Data
Daiki Tanaka
 

More from Daiki Tanaka (12)

[Paper Reading] Theoretical Analysis of Self-Training with Deep Networks on U...
[Paper Reading] Theoretical Analysis of Self-Training with Deep Networks on U...[Paper Reading] Theoretical Analysis of Self-Training with Deep Networks on U...
[Paper Reading] Theoretical Analysis of Self-Training with Deep Networks on U...
 
カーネル法:正定値カーネルの理論
カーネル法:正定値カーネルの理論カーネル法:正定値カーネルの理論
カーネル法:正定値カーネルの理論
 
[Paper Reading] Causal Bandits: Learning Good Interventions via Causal Inference
[Paper Reading] Causal Bandits: Learning Good Interventions via Causal Inference[Paper Reading] Causal Bandits: Learning Good Interventions via Causal Inference
[Paper Reading] Causal Bandits: Learning Good Interventions via Causal Inference
 
[Paper reading] L-SHAPLEY AND C-SHAPLEY: EFFICIENT MODEL INTERPRETATION FOR S...
[Paper reading] L-SHAPLEY AND C-SHAPLEY: EFFICIENT MODEL INTERPRETATION FOR S...[Paper reading] L-SHAPLEY AND C-SHAPLEY: EFFICIENT MODEL INTERPRETATION FOR S...
[Paper reading] L-SHAPLEY AND C-SHAPLEY: EFFICIENT MODEL INTERPRETATION FOR S...
 
Selective inference
Selective inferenceSelective inference
Selective inference
 
Anomaly Detection with VAEGAN and Attention [JSAI2019 report]
Anomaly Detection with VAEGAN and Attention [JSAI2019 report]Anomaly Detection with VAEGAN and Attention [JSAI2019 report]
Anomaly Detection with VAEGAN and Attention [JSAI2019 report]
 
オンライン学習 : Online learning
オンライン学習 : Online learningオンライン学習 : Online learning
オンライン学習 : Online learning
 
[Paper Reading] Attention is All You Need
[Paper Reading] Attention is All You Need[Paper Reading] Attention is All You Need
[Paper Reading] Attention is All You Need
 
Interpretability of machine learning
Interpretability of machine learningInterpretability of machine learning
Interpretability of machine learning
 
The Limits of Popularity-Based Recommendations, and the Role of Social Ties
The Limits of Popularity-Based Recommendations, and the Role of Social TiesThe Limits of Popularity-Based Recommendations, and the Role of Social Ties
The Limits of Popularity-Based Recommendations, and the Role of Social Ties
 
Learning Deep Representation from Big and Heterogeneous Data for Traffic Acci...
Learning Deep Representation from Big and Heterogeneous Data for Traffic Acci...Learning Deep Representation from Big and Heterogeneous Data for Traffic Acci...
Learning Deep Representation from Big and Heterogeneous Data for Traffic Acci...
 
Toeplitz Inverse Covariance-Based Clustering of Multivariate Time Series Data
Toeplitz Inverse Covariance-Based Clustering of Multivariate Time Series DataToeplitz Inverse Covariance-Based Clustering of Multivariate Time Series Data
Toeplitz Inverse Covariance-Based Clustering of Multivariate Time Series Data
 

Recently uploaded

06-12-2024-BudapestDataForum-BuildingReal-timePipelineswithFLaNK AIM
06-12-2024-BudapestDataForum-BuildingReal-timePipelineswithFLaNK AIM06-12-2024-BudapestDataForum-BuildingReal-timePipelineswithFLaNK AIM
06-12-2024-BudapestDataForum-BuildingReal-timePipelineswithFLaNK AIM
Timothy Spann
 
A presentation that explain the Power BI Licensing
A presentation that explain the Power BI LicensingA presentation that explain the Power BI Licensing
A presentation that explain the Power BI Licensing
AlessioFois2
 
Orchestrating the Future: Navigating Today's Data Workflow Challenges with Ai...
Orchestrating the Future: Navigating Today's Data Workflow Challenges with Ai...Orchestrating the Future: Navigating Today's Data Workflow Challenges with Ai...
Orchestrating the Future: Navigating Today's Data Workflow Challenges with Ai...
Kaxil Naik
 
End-to-end pipeline agility - Berlin Buzzwords 2024
End-to-end pipeline agility - Berlin Buzzwords 2024End-to-end pipeline agility - Berlin Buzzwords 2024
End-to-end pipeline agility - Berlin Buzzwords 2024
Lars Albertsson
 
UofT毕业证如何办理
UofT毕业证如何办理UofT毕业证如何办理
UofT毕业证如何办理
exukyp
 
原版一比一利兹贝克特大学毕业证(LeedsBeckett毕业证书)如何办理
原版一比一利兹贝克特大学毕业证(LeedsBeckett毕业证书)如何办理原版一比一利兹贝克特大学毕业证(LeedsBeckett毕业证书)如何办理
原版一比一利兹贝克特大学毕业证(LeedsBeckett毕业证书)如何办理
wyddcwye1
 
一比一原版兰加拉学院毕业证(Langara毕业证书)学历如何办理
一比一原版兰加拉学院毕业证(Langara毕业证书)学历如何办理一比一原版兰加拉学院毕业证(Langara毕业证书)学历如何办理
一比一原版兰加拉学院毕业证(Langara毕业证书)学历如何办理
hyfjgavov
 
Experts live - Improving user adoption with AI
Experts live - Improving user adoption with AIExperts live - Improving user adoption with AI
Experts live - Improving user adoption with AI
jitskeb
 
The Ipsos - AI - Monitor 2024 Report.pdf
The  Ipsos - AI - Monitor 2024 Report.pdfThe  Ipsos - AI - Monitor 2024 Report.pdf
The Ipsos - AI - Monitor 2024 Report.pdf
Social Samosa
 
一比一原版英属哥伦比亚大学毕业证(UBC毕业证书)学历如何办理
一比一原版英属哥伦比亚大学毕业证(UBC毕业证书)学历如何办理一比一原版英属哥伦比亚大学毕业证(UBC毕业证书)学历如何办理
一比一原版英属哥伦比亚大学毕业证(UBC毕业证书)学历如何办理
z6osjkqvd
 
Palo Alto Cortex XDR presentation .......
Palo Alto Cortex XDR presentation .......Palo Alto Cortex XDR presentation .......
Palo Alto Cortex XDR presentation .......
Sachin Paul
 
Predictably Improve Your B2B Tech Company's Performance by Leveraging Data
Predictably Improve Your B2B Tech Company's Performance by Leveraging DataPredictably Improve Your B2B Tech Company's Performance by Leveraging Data
Predictably Improve Your B2B Tech Company's Performance by Leveraging Data
Kiwi Creative
 
Population Growth in Bataan: The effects of population growth around rural pl...
Population Growth in Bataan: The effects of population growth around rural pl...Population Growth in Bataan: The effects of population growth around rural pl...
Population Growth in Bataan: The effects of population growth around rural pl...
Bill641377
 
一比一原版(UMN文凭证书)明尼苏达大学毕业证如何办理
一比一原版(UMN文凭证书)明尼苏达大学毕业证如何办理一比一原版(UMN文凭证书)明尼苏达大学毕业证如何办理
一比一原版(UMN文凭证书)明尼苏达大学毕业证如何办理
nyfuhyz
 
一比一原版南十字星大学毕业证(SCU毕业证书)学历如何办理
一比一原版南十字星大学毕业证(SCU毕业证书)学历如何办理一比一原版南十字星大学毕业证(SCU毕业证书)学历如何办理
一比一原版南十字星大学毕业证(SCU毕业证书)学历如何办理
slg6lamcq
 
"Financial Odyssey: Navigating Past Performance Through Diverse Analytical Lens"
"Financial Odyssey: Navigating Past Performance Through Diverse Analytical Lens""Financial Odyssey: Navigating Past Performance Through Diverse Analytical Lens"
"Financial Odyssey: Navigating Past Performance Through Diverse Analytical Lens"
sameer shah
 
Module 1 ppt BIG DATA ANALYTICS_NOTES FOR MCA
Module 1 ppt BIG DATA ANALYTICS_NOTES FOR MCAModule 1 ppt BIG DATA ANALYTICS_NOTES FOR MCA
Module 1 ppt BIG DATA ANALYTICS_NOTES FOR MCA
yuvarajkumar334
 
DSSML24_tspann_CodelessGenerativeAIPipelines
DSSML24_tspann_CodelessGenerativeAIPipelinesDSSML24_tspann_CodelessGenerativeAIPipelines
DSSML24_tspann_CodelessGenerativeAIPipelines
Timothy Spann
 
Challenges of Nation Building-1.pptx with more important
Challenges of Nation Building-1.pptx with more importantChallenges of Nation Building-1.pptx with more important
Challenges of Nation Building-1.pptx with more important
Sm321
 
4th Modern Marketing Reckoner by MMA Global India & Group M: 60+ experts on W...
4th Modern Marketing Reckoner by MMA Global India & Group M: 60+ experts on W...4th Modern Marketing Reckoner by MMA Global India & Group M: 60+ experts on W...
4th Modern Marketing Reckoner by MMA Global India & Group M: 60+ experts on W...
Social Samosa
 

Recently uploaded (20)

06-12-2024-BudapestDataForum-BuildingReal-timePipelineswithFLaNK AIM
06-12-2024-BudapestDataForum-BuildingReal-timePipelineswithFLaNK AIM06-12-2024-BudapestDataForum-BuildingReal-timePipelineswithFLaNK AIM
06-12-2024-BudapestDataForum-BuildingReal-timePipelineswithFLaNK AIM
 
A presentation that explain the Power BI Licensing
A presentation that explain the Power BI LicensingA presentation that explain the Power BI Licensing
A presentation that explain the Power BI Licensing
 
Orchestrating the Future: Navigating Today's Data Workflow Challenges with Ai...
Orchestrating the Future: Navigating Today's Data Workflow Challenges with Ai...Orchestrating the Future: Navigating Today's Data Workflow Challenges with Ai...
Orchestrating the Future: Navigating Today's Data Workflow Challenges with Ai...
 
End-to-end pipeline agility - Berlin Buzzwords 2024
End-to-end pipeline agility - Berlin Buzzwords 2024End-to-end pipeline agility - Berlin Buzzwords 2024
End-to-end pipeline agility - Berlin Buzzwords 2024
 
UofT毕业证如何办理
UofT毕业证如何办理UofT毕业证如何办理
UofT毕业证如何办理
 
原版一比一利兹贝克特大学毕业证(LeedsBeckett毕业证书)如何办理
原版一比一利兹贝克特大学毕业证(LeedsBeckett毕业证书)如何办理原版一比一利兹贝克特大学毕业证(LeedsBeckett毕业证书)如何办理
原版一比一利兹贝克特大学毕业证(LeedsBeckett毕业证书)如何办理
 
一比一原版兰加拉学院毕业证(Langara毕业证书)学历如何办理
一比一原版兰加拉学院毕业证(Langara毕业证书)学历如何办理一比一原版兰加拉学院毕业证(Langara毕业证书)学历如何办理
一比一原版兰加拉学院毕业证(Langara毕业证书)学历如何办理
 
Experts live - Improving user adoption with AI
Experts live - Improving user adoption with AIExperts live - Improving user adoption with AI
Experts live - Improving user adoption with AI
 
The Ipsos - AI - Monitor 2024 Report.pdf
The  Ipsos - AI - Monitor 2024 Report.pdfThe  Ipsos - AI - Monitor 2024 Report.pdf
The Ipsos - AI - Monitor 2024 Report.pdf
 
一比一原版英属哥伦比亚大学毕业证(UBC毕业证书)学历如何办理
一比一原版英属哥伦比亚大学毕业证(UBC毕业证书)学历如何办理一比一原版英属哥伦比亚大学毕业证(UBC毕业证书)学历如何办理
一比一原版英属哥伦比亚大学毕业证(UBC毕业证书)学历如何办理
 
Palo Alto Cortex XDR presentation .......
Palo Alto Cortex XDR presentation .......Palo Alto Cortex XDR presentation .......
Palo Alto Cortex XDR presentation .......
 
Predictably Improve Your B2B Tech Company's Performance by Leveraging Data
Predictably Improve Your B2B Tech Company's Performance by Leveraging DataPredictably Improve Your B2B Tech Company's Performance by Leveraging Data
Predictably Improve Your B2B Tech Company's Performance by Leveraging Data
 
Population Growth in Bataan: The effects of population growth around rural pl...
Population Growth in Bataan: The effects of population growth around rural pl...Population Growth in Bataan: The effects of population growth around rural pl...
Population Growth in Bataan: The effects of population growth around rural pl...
 
一比一原版(UMN文凭证书)明尼苏达大学毕业证如何办理
一比一原版(UMN文凭证书)明尼苏达大学毕业证如何办理一比一原版(UMN文凭证书)明尼苏达大学毕业证如何办理
一比一原版(UMN文凭证书)明尼苏达大学毕业证如何办理
 
一比一原版南十字星大学毕业证(SCU毕业证书)学历如何办理
一比一原版南十字星大学毕业证(SCU毕业证书)学历如何办理一比一原版南十字星大学毕业证(SCU毕业证书)学历如何办理
一比一原版南十字星大学毕业证(SCU毕业证书)学历如何办理
 
"Financial Odyssey: Navigating Past Performance Through Diverse Analytical Lens"
"Financial Odyssey: Navigating Past Performance Through Diverse Analytical Lens""Financial Odyssey: Navigating Past Performance Through Diverse Analytical Lens"
"Financial Odyssey: Navigating Past Performance Through Diverse Analytical Lens"
 
Module 1 ppt BIG DATA ANALYTICS_NOTES FOR MCA
Module 1 ppt BIG DATA ANALYTICS_NOTES FOR MCAModule 1 ppt BIG DATA ANALYTICS_NOTES FOR MCA
Module 1 ppt BIG DATA ANALYTICS_NOTES FOR MCA
 
DSSML24_tspann_CodelessGenerativeAIPipelines
DSSML24_tspann_CodelessGenerativeAIPipelinesDSSML24_tspann_CodelessGenerativeAIPipelines
DSSML24_tspann_CodelessGenerativeAIPipelines
 
Challenges of Nation Building-1.pptx with more important
Challenges of Nation Building-1.pptx with more importantChallenges of Nation Building-1.pptx with more important
Challenges of Nation Building-1.pptx with more important
 
4th Modern Marketing Reckoner by MMA Global India & Group M: 60+ experts on W...
4th Modern Marketing Reckoner by MMA Global India & Group M: 60+ experts on W...4th Modern Marketing Reckoner by MMA Global India & Group M: 60+ experts on W...
4th Modern Marketing Reckoner by MMA Global India & Group M: 60+ experts on W...
 

The Million Domain Challenge: Broadcast Email Prioritization by Cross-domain Recommendation

  • 1. 1 KYOTO UNIVERSITY KYOTO UNIVERSITY The Million Domain Challenge: Broadcast Email Prioritization by Cross-domain Recommendation Daiki Tanaka Kashima lab., Kyoto University Research Seminar, 2017/6/12(Mon)
  • 2. 2 KYOTO UNIVERSITY Today’s paper: n Title : The Million Domain Challenge: Broadcast Email Prioritization by Cross-domain Recommendation n Venue: KDD2016 n Authors: Beidou Wang Zhejiang University Yikang Liao Martin Ester Simon Fraser University Yu Zhu Deng Cai Zhejiang University Jiajun Bu Ziyu Guan Northwest University of China
  • 3. 3 KYOTO UNIVERSITY Overview: n Background n Related works n Problem definition n Proposed method - CBEP n Experiment n Conclusion
  • 5. 5 KYOTO UNIVERSITY Background: • E-mail overload is causing serious troubles. • A person has to waste 1 hour per day to handle unimportant emails n Various literature work on personalized email prioritization.(e.g. google) l predict importance labels for emails. n However, broadcast email has been overlooked in the previous personalized email prioritization literature.
  • 6. 6 KYOTO UNIVERSITY Background : Challenges of Broadcast Email n Same sender problem l A receiver may get many different emails with various importance level from the same sender n The limited types of users feedback l We usually don’t reply to a broadcast email.
  • 7. 7 KYOTO UNIVERSITY Back ground : Key idea Collaborative filtering problem n Each broadcast email is sent to all users of a mailing list. l So other users' feedback (view or not) can be very helpful in predicting the priority for a target user. n For a user, if other users with similar interest have viewed it, he should likely also view it.
  • 8. 8 KYOTO UNIVERSITY Background : Key idea Cross Domain Recommendation n Cross domain recommendation transfer knowledge from source domains to the target domain n In our research, we treat each mailing list as a domain. n There are millions of domains in an email-system. Knowledge transfer Target domain Source domain
  • 10. 10 KYOTO UNIVERSITY Related work: n Prioritization for Emails l Using Linear logistic regression model l Using social networks to capture user groups l Using SVM n Cross Domain Recommendation l Previous cross domain recommendation works focused on a relatively small set of domains. (2 or 3 domains) l Selection of source domains is done manually Cannot be applied to broadcast email
  • 11. 11 KYOTO UNIVERSITY Overview: n Problem definition
  • 12. 12 KYOTO UNIVERSITY Problem Definition: variables n User set: 𝑼 n Email set : 𝑬 n Email importance matrix : 𝑰 𝐼%,' = ) 1 if user u has viewed email e. 0 if user u has;t viewed email e. n Mailing list : 𝑀> ⊂ 𝑼, 𝐌 = {𝑴 𝟏, … , 𝑴 𝒏} n Email set sent to 𝑴𝒊 : 𝑬𝒊 ⊂ 𝑬 n New email 𝑒I'J will be sent to a mailing list 𝑴 𝒕 (target mailing list)
  • 13. 13 KYOTO UNIVERSITY Problem Definition : Goal n We want to predict whether a broadcast email is important or not for a given user l Input : user set and email set l Output : prediction of a label of email (important or not)
  • 14. 14 KYOTO UNIVERSITY Problem Definition: dividing into 3 sub problems n The broadcast email prioritization problem can be divided into the following three sub problems. 1. Sample the feedback from a small portion of users, since each broadcast email waiting for prioritization is completely cold with no user interaction. 2. Find the optimal set of source mailing lists whose extra information can help with priority prediction. 3. Predict the priority of the broadcast email with the help of the feedback from the sampled users and extra information from the source mailing lists.
  • 15. 15 KYOTO UNIVERSITY Overview: n Proposed Method - CBEP framework
  • 16. 16 KYOTO UNIVERSITY Proposed method : CBEP framework n We introduce CBEP to solve three sub problems of broadcast email prioritization: 1. user feedback sampling 2. optimal source domain set selection (major contribution of this paper) 3. priority prediction
  • 17. 17 KYOTO UNIVERSITY CBEP framework (1/3) : 1.user feedback sampling n we send a new mail to all the users without priority labels and we wait for a short period of time n Sampled user set : 𝑺 ∈ 𝑀N n then collect feedbacks from users l Positive feedback : the email is viewed l Negative feedback : the email isn’t viewed
  • 18. 18 KYOTO UNIVERSITY CBEP framework (2/3) : 2.Optimal Source Domain Set Selection n Given the target mailing list 𝑀N, we defined a binary vector 𝜶 = (𝛼R, … , 𝛼I)T as follows: 𝛼𝒊 = ) 1 𝑖𝑓 𝑡ℎ𝑒 𝑠𝑜𝑢𝑟𝑐𝑒 𝑚𝑎𝑖𝑙𝑖𝑛𝑔 𝑙𝑖𝑠𝑡 𝑴𝒊 𝑖𝑠 𝑠𝑒𝑙𝑒𝑐𝑡𝑒𝑑 0 𝑒𝑙𝑠𝑒 n Our goal is to get 𝜶 that maximizes the objective function.
  • 19. 19 KYOTO UNIVERSITY CBEP framework (2/3) : 2.Optimal Source Domain Set Selection n we consider three factors to select the optimal source domains : l overlap of users l feedback pattern similarity l coverage of users
  • 20. 20 KYOTO UNIVERSITY CBEP framework (2/3) : 2.Optimal Source Domain Set Selection overlap of users n For a source mailing list and a target mailing list, we define overlap of users as : n 𝑴> ∶ 𝑠𝑜𝑢𝑟𝑐𝑒 𝑚𝑎𝑖𝑙𝑖𝑛𝑔 𝑙𝑖𝑠𝑡 n 𝑴N ∶ 𝑡𝑎𝑟𝑔𝑒𝑡 𝑚𝑎𝑖𝑙𝑖𝑛𝑔 𝑙𝑖𝑠𝑡
  • 21. 21 KYOTO UNIVERSITY CBEP framework (2/3) : 2.Optimal Source Domain Set Selection Similar feedback pattern n Next, We defined the similarity of the feedback patterns between two mailing lists 𝑀N 𝑎𝑛𝑑 𝑀> as follows : 𝑠𝑖𝑚>(𝑡) = 1 − 1 2 𝑪N,> h i cos 𝒗N,%, 𝒗N,J − cos 𝒗𝒊,𝒖, 𝒗>,J %,J∈𝑪o,p l 𝑪N,> : the shared user set between two mailing lists 𝑀N 𝑎𝑛𝑑 𝑀>. l 𝒗>,% ; binary vector with each entry indicating whether user u has read mails in 𝑬𝒊 (which are sent to mailing list 𝑀>).
  • 22. 22 KYOTO UNIVERSITY CBEP framework (2/3) : 2.Optimal Source Domain Set Selection Coverage of Users n We want the number of shared users between 𝑀;(𝑠𝑜𝑢𝑟𝑐𝑒 𝑚𝑎𝑖𝑙𝑖𝑛𝑔 𝑙𝑖𝑠𝑡 𝑠𝑒𝑡)𝑎𝑛𝑑 𝑀N(target mailing list) to be as large as possible. n That’s to say we want to choose a size-k mailing list set M’. max u 𝑪>,N vp⊆v; n This problem is NP-hard. (Maximum coverage problem) n Instead of this, we define overlap percentage between source mailing lists 𝑀>, 𝑀x 𝑎𝑛𝑑 𝑡ℎ𝑒 𝑡𝑎𝑟𝑔𝑒𝑡 𝑚𝑎𝑖𝑙𝑖𝑛𝑔 𝑙𝑖𝑠𝑡 𝑀N as follows : 𝑜𝑣𝑒𝑟𝑙𝑎𝑝>,x 𝑡 = 𝑴> ∩ 𝑴x ∩ 𝑴N 𝑴N
  • 23. 23 KYOTO UNIVERSITY CBEP framework (2/3) :2.Optimal Source Domain Set Selection Objective function n Objective function : n This is a difficult problem.(with both quadratic term and fraction) n So we proposed two approximate solutions. This is a normalizer preventing the function from selecting too many source mailing lists
  • 24. 24 KYOTO UNIVERSITY CBEP framework (2/3) : 2.Optimal Source Domain Set Selection n Approximate solution 1. (this is used in CBEP-A1 in experiments) n Relax the constraint (to make it a quadratic linear programming) n Setting a threshold 𝛾 l Source domains with 𝛼> ≥ 𝛾 are selected
  • 25. 25 KYOTO UNIVERSITY CBEP framework (2/3) : 2.Optimal Source Domain Set Selection n Approximate solution 2. (this is used in CBEP-A2 in experiments) n We solve this for 𝑧€•‚ times for 𝑧ƒ ∈ {1,2, … , 𝑧€•‚} n 𝑧€•‚ ∶ upper bound of the number of source domains
  • 26. 26 KYOTO UNIVERSITY CBEP framework (3/3) : 3.Priority Prediction n Feedback set 𝐼; = {𝐼vo,„o , 𝐼v…,„†… , 𝐼‡,'ˆ‰Š } l Matrix 𝑰v,„ is the feedback from user set 𝑀 on email set 𝐸. n We use a weighted low-rank approximation method. (Matrix factorization) n 𝑰 ≃ 𝑷𝑸 𝑻 users items Rating
  • 27. 27 KYOTO UNIVERSITY CBEP framework (3/3) : 3.Priority Prediction –Matrix factorization problem n Our objective is to minimize the following loss function. ℒ 𝑷, 𝑸 = i 𝑊>x 𝑰; >x − 𝑷>. 𝑸x. ” + 𝜆( 𝑷 — h + 𝑸 — h ) >,x n 𝑷 𝑎𝑛𝑑 𝑸 stand for the latent vectors for users{𝑴;, 𝑴N} and items{𝑬v;, 𝑬N, 𝑒I'J} n Alternating Least Squares(ALS) is used to solve this.
  • 28. 28 KYOTO UNIVERSITY CBEP framework (3/3) : 3.Priority Prediction 𝑚𝑖𝑛ℒ 𝑷, 𝑸 = i 𝑊>x 𝑰; >x − 𝑷>. 𝑸x. ” + 𝜆( 𝑷 — h + 𝑸 — h ) >,x l Fixing Q, and solving ˜ℒ 𝑷,𝑸 ˜𝑷p. 𝑷>. = 𝑰; > 𝑾>. š 𝑸(𝑸 𝑻 𝑾𝒊. š 𝑸 + 𝜆 i 𝑊>x 𝑰𝑫 x ) œR l Fixing P, and solving ˜ℒ 𝑷,𝑸 ˜𝑸•. 𝑸𝒋. = 𝑰′.x ” 𝑾.x š 𝑷(𝑷 𝑻 𝑾.𝒋 š 𝑷 + 𝜆(i 𝑊>x 𝑰𝑫 x )) œR n For each remaining user 𝑢> ∈ (𝑴N−𝑺) , the priority to 𝑒I'J is predicted as : 𝐼>,'ˆ‰Š = 𝑷𝒊 𝑸 𝒆 𝒏𝒆𝒘 𝑻
  • 29. 29 KYOTO UNIVERSITY CBEP framework (3/3) : 3.Priority Prediction n We define the percentage of users considering email 𝑒I'J important as : 𝐻 𝑒I'J = 𝑝𝑜𝑠(𝑒I'J) 𝑝𝑜𝑠•£¤(𝑴N) 𝑡𝑟(𝑰 𝑴 𝒕 𝑻 𝑰 𝑴 𝒕 ) 𝑴 𝒕 ∗ 𝑬 𝒕 • 𝑝𝑜𝑠(𝑒I'J) : total number of viewed-email behaviors observed in the waiting time for 𝑒I'J. • 𝑝𝑜𝑠•£¤(𝑴N) : average number of viewed-email behaviors observed in the waiting time for all the emails from 𝑴N. n ““For the top H(𝑒I'J) percent of users according to 𝑦>,'ˆ‰Š ,we predict 𝑒I'J as important while for others as unimportant.””
  • 31. 31 KYOTO UNIVERSITY Experiments : dataset n emails and their view logs from a large business mailing list within Samsung. l 6506 broadcasting emails l 333,979 view records. l 490 mailing-lists • training set : 5475 emails and their view records • testing set : 1031 emails and their view records
  • 32. 32 KYOTO UNIVERSITY Experiment1 : Evaluation Metrics l In the experiment, we evaluate the precision, recall and f-score at two levels. l Mail level • Average of all the emails in the test set l Mailing list level • Average of all mailing lists in the test set
  • 33. 33 KYOTO UNIVERSITY Experimen1 : Baselines n Single Mailing List (SML) • Only considering the information from the target mailing list. n All Mailing Lists (AML) • Considering all the source mailing lists. n Overlapping Mailing Lists(OML) • Select top-k source mailing lists with largest overlap with the target domain. n Feedback Similar Mailing Lists (FSML) • Select top-k source mailing lists with highest feedback similarity with the target domain n CBEP Without Weight (CBEP-SVD) – using SVD in prediction
  • 34. 34 KYOTO UNIVERSITY Experiment1 : Results n Proposed method(CBEP-A1 and CBEP-A2) outperform all the baselines on all the evaluation metrics.
  • 35. 35 KYOTO UNIVERSITY Experiment2 : n We consider three factors to select the optimal source domains. n In this experiment, we remove these three factors one at a time. n In this way, we evaluate how much these factors affect the prediction precision. n Mailing list level results for CBEP-A1
  • 38. 38 KYOTO UNIVERSITY Conclusion l We introduce the problem of personalized broadcast email prioritization considering large number of mailing lists. l We propose a novel cross domain recommendation framework CBEP. l We show that our method CBEP outperforms all the baselines.