Survival Factorization on Diffusion Networks

Survival Factorization
on Diffusion Networks
Nicola Barbieri, Giuseppe Manco and Ettore Ritacco
Tumblr, 35 E 21st St, 10010, New York, USA -
nicola@tumblr.com
ICAR - CNR, via Bucci 7/11C, 87036 Arcavacata di Rende
(CS), ITALY - giuseppe.manco@icar.cnr.it,
ettore.ritacco@icar.cnr.it

Context
• Users can create contents
• Contents can be shared within a diffusion network
• The diffusion takes place within cascades
• Trees of timed word-of-mouth chains

Relevant Questions
• What makes a content popular?
• Which creators are able to trigger a cascade?
• Who will share a content?
• When will someone share a content?
• Who is expert in a topic characterizing a set of contents?
• Who is interested in a topic?
• Which are the most popular topics?
• …

Focus of this Talk
• What makes a content popular?
• Which creators are able to trigger a cascade?
• Who will share a content?
• When will someone share a content?
• Who is expert in a topic characterizing a set of contents?
• Who is interested in a topic?
• Which are the most popular topics?
• …

Idea
• Information spreading within a network = disease contagion
• A user shares a content = an individual is infected
• Active on a given cascade
• A user does not share it = the individual resists the contagion
• Inactive on a given cascade
• Information Diffusion in terms of Survival Analysis
• The observations in a time horizon where a user can either resist or be infected
• Warning: the network is implicit!
• We can only observe the content adoptions, i.e. the contagion

Key elements (1)
M. Gomez-Rodriguez, D. Balduzzi, and B. Scholkopf. Uncovering the temporal dynamics of diffusion networks. In ICML 2011.
Time
Recent contentOld content Observation
• Contagion is time-dependent
• the probability of contagion depends on the time when the target gets in
touch with the content

Key elements (2)
M. Gomez-Rodriguez, D. Balduzzi, and B. Scholkopf. Uncovering the temporal dynamics of diffusion networks. In ICML 2011.
Time
Recent carrierOld carrier Observation
• Contagion is carrier-dependent
• Some carriers are more infectious than others
• Influence exerted in the diffusion process

Formally
• 𝒕"
= 𝑡% 𝑐 , … , 𝑡) 𝑐 a cascade with content 𝑐,
• 𝑁 is the number of users
• 𝑡+ 𝑐 ∈ 0, 𝑇" ∪ ∞ is the timestamp when the node 𝑢 becomes active on
the cascade 𝒕",
• 𝑇" is the time horizon
• The probability of user 𝒖 being infected by user 𝒗 at time 𝑡+ 𝑐 is
given by:
𝑓 𝑡+ 𝑐 |𝑡6 𝑐 , 𝜆+,6 ∝ 𝑒:;<,= >< " :>= "

The Infection model
• 𝑣 is the influencer
• 𝜆+,6 represents the influence exerted by 𝑣 on 𝑢
• The transmission rate
• 𝑆 𝑡+ 𝑐 |𝑡6 𝑐 , 𝜆+,6 = 𝑒:;<,= >< " :>= "
is the survival function,
• the probability of resisting the contagion 𝑝 𝑇 ≥ 𝑡+ 𝑐 |𝑡6 𝑐
• 𝑡+ 𝑐 − 𝑡6 𝑐 represents the exposure time
• The longer the delay, the lower the probability of infection
𝑓 𝑡+ 𝑐 |𝑡6 𝑐 , 𝜆+,6 = 𝜆+,6 ⋅ 𝑒:;<,= >< " :>= "

Building the Survival Model – Step 1
• A cascade is composed by users that activate and users that resist the
contagion Latent infection indicator
Immune users
𝑝 𝒕"
|𝒀, 𝚲 = H 𝑓 𝑡+ 𝑐 |𝑡6 𝑐 , 𝜆+,6
I<,=
J
𝑆 𝑡+ 𝑐 − 𝑡6 𝑐
%:I<,=
J
+,6 L">M6N
⋅ H H 𝑆 𝑇"
− 𝑡6 𝑐 |𝜆+,6
6 L">M6N+ MOL">M6N

Infected users

Building the Survival Model – Issues
• The nature of the transmission rate 𝝀 determines the adaptiveness of
the model to the personalization of the contagion
• A very fine-grain approach:
• a single value 𝜆+,6 for each pair of users within each cascade
• This approach is intractable in real scenarios
• The matrix 𝚲, containing all the 𝜆+,6, has size 𝑁T

Key elements (3)
TimeRecent carrierOld carrier Observation
• Contagion is topic-dependent
• Susceptibility and influence are relative the content
• Content is characterized by topics
Two topics per transmission

The Infection model (2)
• 𝜆+,6
U
represents the influence exerted by 𝑣 on 𝑢 about topic 𝑘
• The topical transmission rate
𝑓 𝑡+ 𝑐 |𝑡6 𝑐 , 𝜆+,6
U
∝ 𝑒:;<,=
W >< " :>= "

• 𝜆+,6
U
𝑓 𝑡+ 𝑐 |𝑡6 𝑐 , 𝜆+,6
U
∝ 𝑒:;<,=
W >< " :>= "
Topic dependency

• 𝜆+,6
U
𝑓 𝑡+ 𝑐 |𝑡6 𝑐 , 𝜆+,6
U
∝ 𝑒:;<,=
W >< " :>= "
𝜆+,6,U = 𝐴6,U ⋅ 𝑆+,U
Susceptibility on topic 𝑘

• 𝜆+,6
U
𝑓 𝑡+ 𝑐 |𝑡6 𝑐 , 𝜆+,6
U
∝ 𝑒:;<,=
W >< " :>= "
𝜆+,6,U = 𝐴6,U ⋅ 𝑆+,U
Influence on topic 𝑘

Building the Survival Model – Step 3
• The likelihood of a cascade is topic dependent
𝑝 𝒕"|𝒁, 𝒀, 𝚲 = H H 𝑓 𝑡+ 𝑐 |𝑡6 𝑐 , 𝜆+,6
U I<,=
J
𝑆 𝑡+ 𝑐 − 𝑡6 𝑐
%:I<,=
J ZJ,W
+,6 L">M6NU
⋅ H H H 𝑆 𝑇" − 𝑡6 𝑐 |𝜆+,6
U
6 L">M6N+ MOL">M6NU

Topics
Latent topic indicator

The complete model
• Content can be modeled jointly
• E.g., textual content model by a mixture of Poisson distributions expressing
topic dependency
• For each cascade 𝑐 ∈ 1, … , 𝑀
o Sample the topical diffusion pattern,
𝑧" ∼ 𝑀𝑢𝑙𝑡𝑖𝑛𝑜𝑚𝑖𝑎𝑙 𝛩
o For each word 𝑤 in 𝑐
§ Sample the occurrences of 𝑤 in 𝑐,
𝑛g," ∼ 𝑃𝑜𝑖𝑠𝑠𝑜𝑛 𝛷
o For each user 𝑢 in 𝑐
§ Sample the user who generated the
contagion, 𝑦+
" ∼ 𝑀𝑢𝑙𝑡𝑖𝑛𝑜𝑚𝑖𝑎𝑙 Ξ
§ Sample her activation time,
𝑡+ 𝑐 ∼ 𝑊𝑒𝑖𝑏𝑢𝑙𝑙 𝑧", 𝑦+
", 𝐴, 𝑆
Ξ

Model Learning
• EM approach
• E-step:
• Update latent variables
• M-step:
• Given the status of the latent variables 𝒁 and 𝒀, update parameters
• Linear complexity!
• The update equations in the EM algorithm can be optimized by exploiting the
factorization of 𝜆+,6
U

Exploiting the model
• We started with four questions:
• Q: Who will share a content?
• A: users infected within a given time horizon
• Q: When will someone share a content?
• A: A sample from 𝑝 𝒕"
|𝒁, 𝒀, 𝚲
• Q: Who is expert in a topic characterizing a set of contents?
• A: Influential users, see 𝐴6,U
• Q: Who is interested in a topic?
• A: Susceptible users, see 𝑆+,U

Comparison with the Literature

Evaluation
• Activation prediction:
• Two samples of Twitter (filtered/noisy draws)
• Testing protocol:
• Given an incomplete cascade (50%, 80%), fill the missing activations
• Predict activation times
• Influencers and topics:
• MemeTracker dataset
• Testing protocol:
• A semantic (handmade) analysis on the top topics and most influential users

Twitter
• ROC curves on predicting
user’s retweet time on
Twitter- Large (noisy sample,
first row) and Twitter-Small
(filtered sample, second row)

Conclusions
• Robust, efficient and accurate modeling of information cascades
• Factorizing the infection rate uncovers highly relevant
information concerning the underlying diffusion process
• Works with general Weibull distribution, not just the exponential
• Future work
• Bayesian learning: The underlying probability distributions allow conjugate priors
• Exploit multiple mutual elicitation processes (e.g. Hawkes processes) in the same
modeling
• Deep architectures for combining heterogeneous content
• Content dynamics within a cascade

Survival Factorization on Diffusion Networks

More Related Content

Similar to Survival Factorization on Diffusion Networks

Recently uploaded

Survival Factorization on Diffusion Networks