Survival Factorization
on Diffusion Networks
Nicola Barbieri, Giuseppe Manco and Ettore Ritacco
Tumblr, 35 E 21st St, 10010, New York, USA -
nicola@tumblr.com
ICAR - CNR, via Bucci 7/11C, 87036 Arcavacata di Rende
(CS), ITALY - giuseppe.manco@icar.cnr.it,
ettore.ritacco@icar.cnr.it
Context
• Users can	create	contents
• Contents can	be	shared	within	a	diffusion	network
• The	diffusion	takes	place	within	cascades
• Trees	of	timed	word-of-mouth	chains
Relevant	Questions
• What	makes	a	content	popular?
• Which	creators	are	able	to	trigger	a	cascade?
• Who	will	share	a	content?
• When	will	someone	share	a	content?
• Who	is	expert	in	a	topic	characterizing	a	set	of	contents?
• Who	is	interested	in	a	topic?
• Which	are	the	most	popular	topics?
• …
Focus	of	this	Talk
• What	makes	a	content	popular?
• Which	creators	are	able	to	trigger	a	cascade?
• Who will	share	a	content?
• When will	someone	share	a	content?
• Who	is	expert in	a	topic	characterizing	a	set	of	contents?
• Who	is	interested in	a	topic?
• Which	are	the	most	popular	topics?
• …
Idea
• Information	spreading	within	a	network	=	disease	contagion
• A	user	shares	a	content	=	an	individual	is	infected
• Active on	a	given	cascade
• A	user	does	not	share	it	=	the	individual	resists the	contagion
• Inactive on	a	given	cascade
• Information	Diffusion	in	terms	of	Survival	Analysis
• The	observations	in	a	time	horizon	where	a	user	can	either	resist	or	be	infected
• Warning:	the	network	is	implicit!
• We	can	only	observe	the	content	adoptions,	i.e.	the	contagion
Key	elements	(1)
M.	Gomez-Rodriguez,	D.	Balduzzi,	and	B.	Scholkopf.	Uncovering the	temporal dynamics of	diffusion networks.	In	ICML	2011.	
Time
Recent	contentOld	content Observation
• Contagion	is	time-dependent
• the	probability	of	contagion	depends	on	the	time	when	the	target	gets	in	
touch	with	the	content
Key	elements	(2)
M.	Gomez-Rodriguez,	D.	Balduzzi,	and	B.	Scholkopf.	Uncovering the	temporal dynamics of	diffusion networks.	In	ICML	2011.	
Time
Recent	carrierOld	carrier Observation
• Contagion	is	carrier-dependent
• Some	carriers	are	more	infectious	than	others
• Influence exerted	in	the	diffusion	process
Formally
• 𝒕"
= 𝑡% 𝑐 , … , 𝑡) 𝑐 a	cascade with	content	𝑐,	
• 𝑁 is	the	number	of	users
• 𝑡+ 𝑐 ∈ 0, 𝑇" ∪ ∞ is	the	timestamp	when	the	node	𝑢 becomes	active on	
the	cascade	𝒕",	
• 𝑇" is	the	time	horizon	
• The	probability	of	user	𝒖 being	infected	by	user	𝒗 at	time	𝑡+ 𝑐 is	
given	by:
𝑓 𝑡+ 𝑐 |𝑡6 𝑐 , 𝜆+,6 ∝ 𝑒:;<,= >< " :>= "
The	Infection	model
• 𝑣 is	the	influencer	
• 𝜆+,6 represents	the	influence	exerted	by	𝑣 on	𝑢
• The	transmission	rate
• 𝑆 𝑡+ 𝑐 |𝑡6 𝑐 , 𝜆+,6 = 𝑒:;<,= >< " :>= "
is	the	survival	function,	
• the	probability	of	resisting	the	contagion	𝑝 𝑇 ≥	 𝑡+ 𝑐 |𝑡6 𝑐
• 𝑡+ 𝑐 − 𝑡6 𝑐 represents	the	exposure	time
• The	longer	the	delay,	the	lower	the	probability	of	infection
𝑓 𝑡+ 𝑐 |𝑡6 𝑐 , 𝜆+,6 = 𝜆+,6 ⋅ 𝑒:;<,= >< " :>= "
Building	the	Survival	Model	– Step	1
• A	cascade	is	composed	by	users	that	activate	and	users	that	resist	the	
contagion Latent	infection	indicator
Immune	users
𝑝 𝒕"
|𝒀, 𝚲 = H 𝑓 𝑡+ 𝑐 |𝑡6 𝑐 , 𝜆+,6
I<,=
J
𝑆 𝑡+ 𝑐 − 𝑡6 𝑐
%:I<,=
J
+,6	L">M6N
⋅ H H 𝑆 𝑇"
− 𝑡6 𝑐 |𝜆+,6
6	L">M6N+	MOL">M6N
									
Infected	users
Building	the	Survival	Model	– Issues
• The	nature	of	the	transmission	rate	𝝀 determines	the	adaptiveness	of	
the	model	to	the	personalization	of	the	contagion
• A	very	fine-grain approach:
• a	single	value	𝜆+,6 for	each	pair	of	users within	each	cascade
• This	approach	is	intractable	in	real	scenarios
• The	matrix	𝚲,	containing	all	the	𝜆+,6,	has	size	 𝑁T
Key	elements	(3)
TimeRecent	carrierOld	carrier Observation
• Contagion	is	topic-dependent	
• Susceptibility and	influence are	relative	the	content
• Content	is	characterized	by	topics
Two	topics	per	transmission
The	Infection	model	(2)
• 𝜆+,6
U
represents	the	influence	exerted	by	𝑣 on	𝑢 about	topic	𝑘
• The	topical	transmission	rate
𝑓 𝑡+ 𝑐 |𝑡6 𝑐 , 𝜆+,6
U
∝ 𝑒:;<,=
W >< " :>= "
The	Infection	model	(2)
• 𝜆+,6
U
represents	the	influence	exerted	by	𝑣 on	𝑢 about	topic	𝑘
• The	topical	transmission	rate
𝑓 𝑡+ 𝑐 |𝑡6 𝑐 , 𝜆+,6
U
∝ 𝑒:;<,=
W >< " :>= "
Topic	dependency
The	Infection	model	(2)
• 𝜆+,6
U
represents	the	influence	exerted	by	𝑣 on	𝑢 about	topic	𝑘
• The	topical	transmission	rate
𝑓 𝑡+ 𝑐 |𝑡6 𝑐 , 𝜆+,6
U
∝ 𝑒:;<,=
W >< " :>= "
𝜆+,6,U = 𝐴6,U ⋅ 𝑆+,U
Susceptibility on	topic	𝑘
The	Infection	model	(2)
• 𝜆+,6
U
represents	the	influence	exerted	by	𝑣 on	𝑢 about	topic	𝑘
• The	topical	transmission	rate
𝑓 𝑡+ 𝑐 |𝑡6 𝑐 , 𝜆+,6
U
∝ 𝑒:;<,=
W >< " :>= "
𝜆+,6,U = 𝐴6,U ⋅ 𝑆+,U
Influence on	topic	𝑘
Building	the	Survival	Model	– Step	3
• The	likelihood	of	a	cascade	is	topic	dependent
𝑝 𝒕"|𝒁, 𝒀, 𝚲 = H H 𝑓 𝑡+ 𝑐 |𝑡6 𝑐 , 𝜆+,6
U I<,=
J
𝑆 𝑡+ 𝑐 − 𝑡6 𝑐
%:I<,=
J ZJ,W
+,6	L">M6NU
⋅ H H H 𝑆 𝑇" − 𝑡6 𝑐 |𝜆+,6
U
6	L">M6N+	MOL">M6NU
															
Topics
Latent	topic	indicator
The	complete	model
• Content can	be	modeled	jointly
• E.g.,	textual	content	model	by	a	mixture	of	Poisson	distributions	expressing	
topic	dependency
• For each cascade 𝑐 ∈ 1, … , 𝑀
o Sample the topical diffusion pattern,
𝑧" ∼ 𝑀𝑢𝑙𝑡𝑖𝑛𝑜𝑚𝑖𝑎𝑙 𝛩
o For each word 𝑤 in 𝑐
§ Sample the occurrences of 𝑤 in 𝑐,
𝑛g," ∼ 𝑃𝑜𝑖𝑠𝑠𝑜𝑛 𝛷
o For each user 𝑢 in 𝑐
§ Sample the user who generated the
contagion, 𝑦+
" ∼ 𝑀𝑢𝑙𝑡𝑖𝑛𝑜𝑚𝑖𝑎𝑙 Ξ
§ Sample her activation time,
𝑡+ 𝑐 ∼ 𝑊𝑒𝑖𝑏𝑢𝑙𝑙 𝑧", 𝑦+
", 𝐴, 𝑆
Ξ
Model	Learning
• EM	approach
• E-step:
• Update	latent	variables
• M-step:
• Given	the	status	of	the	latent	variables	𝒁 and	𝒀,	update	parameters
• Linear	complexity!	
• The	update	equations	in	the	EM	algorithm	can	be	optimized	by	exploiting	the	
factorization	of		𝜆+,6
U
Model	Learning
• EM	approach
• E-step:
• Update	latent	variables
• M-step:
• Given	the	status	of	the	latent	variables	𝒁 and	𝒀,	update	parameters
• Linear	complexity!	
• The	update	equations	in	the	EM	algorithm	can	be	optimized	by	exploiting	the	
factorization	of		𝜆+,6
U
Exploiting	the	model
• We	started	with	four	questions:
• Q:	Who will	share	a	content?
• A:	users	infected within	a	given	time	horizon
• Q:	When will	someone	share	a	content?
• A:	A	sample	from	𝑝 𝒕"
|𝒁, 𝒀, 𝚲
• Q:	Who	is	expert in	a	topic	characterizing	a	set	of	contents?
• A:	Influential users,	see	𝐴6,U
• Q:	Who	is	interested in	a	topic?
• A:	Susceptible users,	see	𝑆+,U
Comparison	with	the	Literature
Evaluation
• Activation	prediction:
• Two	samples	of	Twitter	(filtered/noisy	draws)
• Testing	protocol:
• Given	an	incomplete	cascade	(50%,	80%),	fill	the	missing	activations
• Predict	activation	times
• Influencers	and	topics:
• MemeTracker dataset
• Testing	protocol:
• A	semantic	(handmade)	analysis	on	the	top	topics	and	most	influential	users
Twitter
• ROC	curves	on	predicting	
user’s	retweet	time	on	
Twitter- Large	(noisy	sample,	
first	row)	and	Twitter-Small	
(filtered	sample,	second	row)
MemeTracker
Conclusions
• Robust,	efficient and	accurate modeling	of	information	cascades
• Factorizing the	infection	rate uncovers	highly	relevant	
information	concerning	the	underlying	diffusion	process
• Works	with	general	Weibull	distribution,	not	just	the	exponential
• Future	work
• Bayesian	learning:	The	underlying	probability	distributions	allow	conjugate	priors
• Exploit	multiple	mutual	elicitation	processes	(e.g.	Hawkes	processes)	in	the	same	
modeling
• Deep	architectures	for	combining	heterogeneous	content
• Content	dynamics	within	a	cascade

Survival Factorization on Diffusion Networks