From Competition to Complementarity: Comparative Influence Diffusion and Maximization

Comparative Influence Maximization:
From Competition to Complementarity
Wei Lu (LinkedIn)
Wei Chen (Microsoft Research)
Laks V.S. Lakshmanan (UBC)
NDA’16 Workshop, SIGMOD
To appear in VLDB’16, New Delhi, India

Social influence
• Ubiquitous in life
• Fueled by the widespread popularity of online
social networks and social media
• Computational Social Influence (CSI)
– Viral Marketing
– Influence Maximization
– The applications and extensions to the above

Computational Social Influence
• Social networks with edge weights (influence
probabilities or weights)
• Stochastic influence/information propagation models
– Single-item vs. Multiple-item models
• Diffusion dynamics depend heavily on the
relationship of the propagating entities
• Pure Competition: Each user adopts at most one
item
– Competitive Independent Cascade Model (CIC)
– K-LT Model
– WPCLT Model …

Limitations of Pure Competition
Models: Example

Item Relationships
• Propagating items can be of any relationship:
– Compete (iPhone vs Nexus)
– Complement (iPhone vs Apple Watch, iPhone vs
iPhone cases)
• Natural and well-studied in economics
– Substitute goods and complementary goods
• Item relationship may be asymmetric
• Item relationship may be to an arbitrary degree
(not “pure”)

Motivations and Challenges
“One model that works for all kinds of item
relationships”: Not existent until this work
Challenges:
• Unified model with great expressive power
• Compact and manageable representation
• Allows room to develop tractable solutions for
natural influence optimization problems
• Model validation, data

Main Contributions
• Comparative Independent Cascade (ComIC):
Capturing both competition and
complementarity, to any arbitrary degree
• Problem: Self Influence Maximization
• Problem: Complementary Influence
Maximization
• Algorithm: Generalized Reverse Reachable Sets
• Algorithm: Sandwich Approximation

Model Overview
• Focusing on two items
– Challenges abundant already
– Future work: extended to an arbitrary number of items
• Edge-level influence/information propagation
– Similar to the classic IC model
• Node-level Decision-making controlled by
Node-Level Automata (NLA)
– Global Adoption Probabilities (GAP)

Global Adoption Probabilities
• Key parameters measuring the degree to which two
items compete with or complement each other
• q(A|0): probability of adopting A when the user has
not yet adopted any other items
• q(A|B): probability of adopting A when the user has
already adopted B
• q(A|0) >= q(A|B): B competes with A
• q(A|0) <= q(A|B): B complements A

Transition diagram
For each item, each node may be of the following status:
• Idle (inactive)
• Informed (influenced)
• Suspended / Adopted / Rejected

Diffusion dynamics
• Initially,every node is inactive/idle wrt both items
• When any node adopts the first item, its
outgoing edges are tested for information
propagation to neighbors (“info channel”)
– Each edge (u,v) becomes open w.p.p(u,v)
• If u is A-adopted, and info channel on edge (u,v)
is open, then v decides to adopt A based on:
– w.p. q(A|0) if v has not adopted B
– w.p. q(A|B) if v has adopted B

Node tie-breaking
• What if there are multiple in-neighbors active in
the last time step t-1?
• Generate a random permutation of those in-
neighbors, and follow that order to test activation
• If one such neighbor adopted both items at t-1,
following the same order for informing
• If a seed is targeted with both items, decide the
order randomly (0.5 and 0.5 prob.)

Node Reconsideration
• Suppose B complements A: q(A|0) <= q(A|B)
• User v was informed of A, but did not adopt with
probability 1 – q(A|0)
• Once v adopts B, since B complements A, user
may want to revisit the decision with a
reconsideration probability:

General Properties of ComIC
model
• Neither submodularity nor monotonicity holds in
an arbitrary instance of the model
• Influence maximization may be intractable
• Overall strategy:
– Identify a parameter subspace such that
submodularity is satisfied
– Develop efficient approximation algorithm
(Generalized RR-set) for submodular cases
– “Sandwich Approximation” for non-submodular
cases

Submodularity: Complementary Case

Possible World Definition
• An equivalent representation of the model and
the propagation dynamics
– Propagation in a possible world is deterministic, easy
to reason about
• Equivalent Possible World model for ComIC
– For each edge (u,v), remove w.p. 1-p(u,v)
– For each node v, randomly generate α(v,A) and α(v,B)
for testing with adoption probabilities.
– Adoption happens when α <= adoption prob.

Influence Maximization Problems
• Self Influence Maximization (SIM): Fix B-seed
set, find the best A-seed set of size k to
maximize A’s expected influence spread
• Complementary Influence Maximization
(CIM): Fix A-seed set, find the best B-seed set of
size k to maximize the boost B gives to A’s
expected influence spread
• Both NP-hard under ComIC model

Algorithm Design for SIM and CIM
• Generalized Reverse-Reachable Set (RR-set):
RR-set based algorithms are the state-of-the-art
for classical influence maximization with single-
item propagation models (IC and LT)
• Sandwich Approximation to achieve
approximation guarantees in non-submodular
cases
• Both techniques are generic and applicable to
any non-submodular maximization problems

Recap: Reverse-Reachable Set
• If u can reach v (in a deterministic directed
graph), then u is in a RR-set rooted at v [Borgs et
al., SODA’14]
• Random RR-set: root v is randomly chosen
• Two-phase Inf. Max. (TIM) [Tang et al 2014]
– Estimate the minimum number of random RR-sets
required, for probabilistic approx. guarantees
• 1-1/e-ε: smaller ε requires more RR-sets to be generated
– Generate random RR-sets using backward BFS
– Seed selection (deterministic max-cover problem)

Recap: TIM Algorithm
• (1-1/e-ε)-approximation with high probability
– Same as greedy, modulo probabilistic part
• Orders of magnitude faster than Greedy + Monte
Carlo simulations
• Scalable to billion-edge graphs
• Applies to a large family of stochastic
propagation models

Generalized RR-set and TIM
Algorithms
• Works for any stochastic propagation models
satisfying monotonicity and submodularity
– Has (1-1/e-ε)-approximation with high probability
• General RR-set (in a deterministic possible
world): u belongs to the RR-set rooted at v if the
singleton seed set {u} can activate v
– Note difference from “reaching”
– Random RR-set: root v is sampled uniformly at
random from the graph

RR-set generation for SIM (RR-
SIM)
• Problem definition and submodular setting
– Fix B-seed set, find A-seed set (size k)
– A is complemented by B: q(A|0) <= q(A|B)
– B is indifferent to A: q(B|0) = q(B|A)
• Phase 1: Forward Labeling: Start from B-seed
set, label node status w.r.t. B
• Phase 2: Backward BFS (details next)

Phase 2: Backward BFS
• Randomly choose root v from the graph
• Enqueue v into a FIFO queue Q
• Until empty, repeatedly dequeue from Q
• Let’s say we get a node u from Q
• Enqueue u’s in-neighbours (with edge test) if
either is true
– u is B-adopted and α(A,u) <= q(A|B)
– u is not B-adopted and α(A,u) <= q(A|0)

RR-Set generation for CIM (RR-
CIM)
• Given A-seed set, find best complementing B set
• Cross-submodularity holds q(B|A) = 1
• Forward Labeling: Start from A-seed set, identify
nodes can be A-adopted potentially
• Backward BFS: Two passes required

Sandwich Approximation
• Given any non-submodular set functions, how to
leverage submodular maximization (e.g., greedy,
local search) to achieve provable approximation
guarantees?
• Answer:
– Derive upper/lower bound submodular functions
(“sandwiched”)
– Use the best of the three solutions, which gives a
data-dependent approximation ratio

Sandwich Approximation
non-submodular, function we
want to maximize
lower bound, submodular
upper bound, submodular

Remarks
• Applicable to any non-submodular function
maximization
• If monotone, run Greedy on the upper bound,
lower bound, and the actual function
• If non-monotone, run Local Search
• Upper/lower bound should be reasonably tight to
be meaningful

Experiments: Datasets
Also have synthetic dataset up to 1 million nodes

Learning Global Adoption Probabilities
Dataset: Flixster
• Signals for adoption: rated a movie
• Signals for informed: “Want to See”, “Not Interested”

Effects of εin General TIM algorithm: Tradeoff
between seed set quality and running time

Thank you!
See you in VLDB’16!

From Competition to Complementarity: Comparative Influence Diffusion and Maximization

Recommended

Recommended

More Related Content

Similar to From Competition to Complementarity: Comparative Influence Diffusion and Maximization

Similar to From Competition to Complementarity: Comparative Influence Diffusion and Maximization (20)

More from Wei Lu

More from Wei Lu (7)

Recently uploaded

Recently uploaded (20)

From Competition to Complementarity: Comparative Influence Diffusion and Maximization

Editor's Notes