Survival Factorization on Diffusion Networks

Survival Factorization
on Diffusion Networks
Nicola Barbieri, Giuseppe Manco and Ettore Ritacco
Tumblr, 35 E 21st St, 10010, New York, USA -
nicola@tumblr.com
ICAR - CNR, via Bucci 7/11C, 87036 Arcavacata di Rende
(CS), ITALY - giuseppe.manco@icar.cnr.it,
ettore.ritacco@icar.cnr.it

Context
• Users can create contents
• Contents can be shared within a diffusion network
• The diffusion takes place within cascades
• Trees of timed word-of-mouth chains

Relevant Questions
• What makes a content popular?
• Which creators are able to trigger a cascade?
• Who will share a content?
• When will someone share a content?
• Who is expert in a topic characterizing a set of contents?
• Who is interested in a topic?
• Which are the most popular topics?
• …

Focus of this Talk
• What makes a content popular?
• Which creators are able to trigger a cascade?
• Who will share a content?
• When will someone share a content?
• Who is expert in a topic characterizing a set of contents?
• Who is interested in a topic?
• Which are the most popular topics?
• …

Idea
• Information spreading within a network = disease contagion
• A user shares a content = an individual is infected
• Active on a given cascade
• A user does not share it = the individual resists the contagion
• Inactive on a given cascade
• Information Diffusion in terms of Survival Analysis
• The observations in a time horizon where a user can either resist or be infected
• Warning: the network is implicit!
• We can only observe the content adoptions, i.e. the contagion

Key elements (1)
M. Gomez-Rodriguez, D. Balduzzi, and B. Scholkopf. Uncovering the temporal dynamics of diffusion networks. In ICML 2011.
Time
Recent contentOld content Observation
• Contagion is time-dependent
• the probability of contagion depends on the time when the target gets in
touch with the content

Key elements (2)
M. Gomez-Rodriguez, D. Balduzzi, and B. Scholkopf. Uncovering the temporal dynamics of diffusion networks. In ICML 2011.
Time
Recent carrierOld carrier Observation
• Contagion is carrier-dependent
• Some carriers are more infectious than others
• Influence exerted in the diffusion process

Formally
• 𝒕"
= 𝑡% 𝑐 , … , 𝑡) 𝑐 a cascade with content 𝑐,
• 𝑁 is the number of users
• 𝑡+ 𝑐 ∈ 0, 𝑇" ∪ ∞ is the timestamp when the node 𝑢 becomes active on
the cascade 𝒕",
• 𝑇" is the time horizon
• The probability of user 𝒖 being infected by user 𝒗 at time 𝑡+ 𝑐 is
given by:
𝑓 𝑡+ 𝑐 |𝑡6 𝑐 , 𝜆+,6 ∝ 𝑒:;<,= >< " :>= "

The Infection model
• 𝑣 is the influencer
• 𝜆+,6 represents the influence exerted by 𝑣 on 𝑢
• The transmission rate
• 𝑆 𝑡+ 𝑐 |𝑡6 𝑐 , 𝜆+,6 = 𝑒:;<,= >< " :>= "
is the survival function,
• the probability of resisting the contagion 𝑝 𝑇 ≥ 𝑡+ 𝑐 |𝑡6 𝑐
• 𝑡+ 𝑐 − 𝑡6 𝑐 represents the exposure time
• The longer the delay, the lower the probability of infection
𝑓 𝑡+ 𝑐 |𝑡6 𝑐 , 𝜆+,6 = 𝜆+,6 ⋅ 𝑒:;<,= >< " :>= "

Building the Survival Model – Step 1
• A cascade is composed by users that activate and users that resist the
contagion Latent infection indicator
Immune users
𝑝 𝒕"
|𝒀, 𝚲 = H 𝑓 𝑡+ 𝑐 |𝑡6 𝑐 , 𝜆+,6
I<,=
J
𝑆 𝑡+ 𝑐 − 𝑡6 𝑐
%:I<,=
J
+,6 L">M6N
⋅ H H 𝑆 𝑇"
− 𝑡6 𝑐 |𝜆+,6
6 L">M6N+ MOL">M6N

Infected users

Building the Survival Model – Issues
• The nature of the transmission rate 𝝀 determines the adaptiveness of
the model to the personalization of the contagion
• A very fine-grain approach:
• a single value 𝜆+,6 for each pair of users within each cascade
• This approach is intractable in real scenarios
• The matrix 𝚲, containing all the 𝜆+,6, has size 𝑁T

Key elements (3)
TimeRecent carrierOld carrier Observation
• Contagion is topic-dependent
• Susceptibility and influence are relative the content
• Content is characterized by topics
Two topics per transmission

The Infection model (2)
• 𝜆+,6
U
represents the influence exerted by 𝑣 on 𝑢 about topic 𝑘
• The topical transmission rate
𝑓 𝑡+ 𝑐 |𝑡6 𝑐 , 𝜆+,6
U
∝ 𝑒:;<,=
W >< " :>= "

• 𝜆+,6
U
𝑓 𝑡+ 𝑐 |𝑡6 𝑐 , 𝜆+,6
U
∝ 𝑒:;<,=
W >< " :>= "
Topic dependency

• 𝜆+,6
U
𝑓 𝑡+ 𝑐 |𝑡6 𝑐 , 𝜆+,6
U
∝ 𝑒:;<,=
W >< " :>= "
𝜆+,6,U = 𝐴6,U ⋅ 𝑆+,U
Susceptibility on topic 𝑘

• 𝜆+,6
U
𝑓 𝑡+ 𝑐 |𝑡6 𝑐 , 𝜆+,6
U
∝ 𝑒:;<,=
W >< " :>= "
𝜆+,6,U = 𝐴6,U ⋅ 𝑆+,U
Influence on topic 𝑘

Building the Survival Model – Step 3
• The likelihood of a cascade is topic dependent
𝑝 𝒕"|𝒁, 𝒀, 𝚲 = H H 𝑓 𝑡+ 𝑐 |𝑡6 𝑐 , 𝜆+,6
U I<,=
J
𝑆 𝑡+ 𝑐 − 𝑡6 𝑐
%:I<,=
J ZJ,W
+,6 L">M6NU
⋅ H H H 𝑆 𝑇" − 𝑡6 𝑐 |𝜆+,6
U
6 L">M6N+ MOL">M6NU

Topics
Latent topic indicator

The complete model
• Content can be modeled jointly
• E.g., textual content model by a mixture of Poisson distributions expressing
topic dependency
• For each cascade 𝑐 ∈ 1, … , 𝑀
o Sample the topical diffusion pattern,
𝑧" ∼ 𝑀𝑢𝑙𝑡𝑖𝑛𝑜𝑚𝑖𝑎𝑙 𝛩
o For each word 𝑤 in 𝑐
§ Sample the occurrences of 𝑤 in 𝑐,
𝑛g," ∼ 𝑃𝑜𝑖𝑠𝑠𝑜𝑛 𝛷
o For each user 𝑢 in 𝑐
§ Sample the user who generated the
contagion, 𝑦+
" ∼ 𝑀𝑢𝑙𝑡𝑖𝑛𝑜𝑚𝑖𝑎𝑙 Ξ
§ Sample her activation time,
𝑡+ 𝑐 ∼ 𝑊𝑒𝑖𝑏𝑢𝑙𝑙 𝑧", 𝑦+
", 𝐴, 𝑆
Ξ

Model Learning
• EM approach
• E-step:
• Update latent variables
• M-step:
• Given the status of the latent variables 𝒁 and 𝒀, update parameters
• Linear complexity!
• The update equations in the EM algorithm can be optimized by exploiting the
factorization of 𝜆+,6
U

Exploiting the model
• We started with four questions:
• Q: Who will share a content?
• A: users infected within a given time horizon
• Q: When will someone share a content?
• A: A sample from 𝑝 𝒕"
|𝒁, 𝒀, 𝚲
• Q: Who is expert in a topic characterizing a set of contents?
• A: Influential users, see 𝐴6,U
• Q: Who is interested in a topic?
• A: Susceptible users, see 𝑆+,U

Comparison with the Literature

Evaluation
• Activation prediction:
• Two samples of Twitter (filtered/noisy draws)
• Testing protocol:
• Given an incomplete cascade (50%, 80%), fill the missing activations
• Predict activation times
• Influencers and topics:
• MemeTracker dataset
• Testing protocol:
• A semantic (handmade) analysis on the top topics and most influential users

Twitter
• ROC curves on predicting
user’s retweet time on
Twitter- Large (noisy sample,
first row) and Twitter-Small
(filtered sample, second row)

Conclusions
• Robust, efficient and accurate modeling of information cascades
• Factorizing the infection rate uncovers highly relevant
information concerning the underlying diffusion process
• Works with general Weibull distribution, not just the exponential
• Future work
• Bayesian learning: The underlying probability distributions allow conjugate priors
• Exploit multiple mutual elicitation processes (e.g. Hawkes processes) in the same
modeling
• Deep architectures for combining heterogeneous content
• Content dynamics within a cascade

Survival Factorization on Diffusion Networks

Recommended

Recommended

More Related Content

Similar to Survival Factorization on Diffusion Networks

Similar to Survival Factorization on Diffusion Networks (20)

Recently uploaded

Recently uploaded (20)

Survival Factorization on Diffusion Networks