Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Β
Least Cost Influence in Multiplex Social Networks
1. Least Cost Influence in
Multiplex Social Networks
MODEL REPRESENTATION AND ANALYSIS
Presented by:
Ayushi Jain Rahul Bobhate
Natasha Mandal Ankur Sachdeva
Dung T. Nguyen, Huiyuan Zhang, Soham Das, My T. Thai, Thang N. Dinh
2. Structure
β’ Define a few terms
β’ Motivation
β’ Related work
β’ Challenges and proposed solution
β’ Math notations and problem definition
β’ Lossless coupling
β’ Lossy coupling
β’ Influence relay
β’ Experiments
β’ Conclusion
3. What are Multiplex networks?
β’ Networks extended to multiple edges between nodes like in more than one
social media platforms
β’ Example: A set of users who interact of Facebook, Twitter & Foursquare
4. What is least cost influence (LCI) problem?
β’ A minimum number of seed users who can eventually inο¬uence a large number
of users
β’ Example: How to ο¬nd the least advertising cost set of inο¬uencers who can
inο¬uence a massive number of users
Or
How to find the minimum number of inducements required for the product
adoption to reach a certain proportion of the population
5. motivation
β’ In the recent decade, the popularity of OSNs has created a major
communication medium which allows for information sharing
β’ Similar to real social networks: word-of-mouth & peer-pressure effect
Do you know how much time does an individual spend(on average) on social
media?
1.72
hours per
day
28% of
online
activity
9. Why is it important to study information diffusion in these networks??
β’ Considerable number of overlapping users
β’ Users can relay the information from one network to another
β’ Example:
Jack
10. If we only consider the information propagation in one network, weβll
fail to identify the most influent users
11. Single network
β’ Kempe et al.
β’ Find a set of k users who can maximize influence
β’ Stochastic process- Independent Cascade Model (IC)
β’ Probability of influencing friends Ξ± Strength of Friendship
β’ NP Hard- greedy algorithm with approximation ratio (1-1/e)
β’ Linear Threshold Model (LT)
β’ User adopts a new product when total influence of friends exceeds a threshold
β’ Dinh et al.
β’ Suggested algorithm for a special case of LT
β’ Influence between users is uniform and user is influenced if a certain fraction Ο of
his friends are active
Related work
12. Multiplex Networks
β’ Yagan et al.
β’ Studied connection between online and offline networks
β’ Investigated outbreak of information using SIR model on random networks
β’ Liu at al.
β’ Analyzed networks formed by online interaction and offline events
Drawbacks:
β’ Studied flow of information and network clustering but not LCI
β’ Did not study specific optimization problem of viral marketing
β’ Shen et al.
β’ Studied information propagation in multiplex OSN
β’ Combined all networks into one network by representing an overlapping user as a
super node
β’ Cannot preserve individual networksβ properties
13. challenges
How to evaluate
influence of
overlapping
users in
multiplex
networks?
In which
network, a user
is easier to be
inο¬uenced?
Which network
propagates the
inο¬uence
better?
14. β’ In this paper, we study LCI for a set of users with minimum cardinality to
influence a certain fraction of users in multiplex networks
β’ Represent a model for various coupling schemes to reduce the problem in
multiplex networks to an equivalent problem on a single network. Coupling
schemes can be applied for most popular diffusion models including: Linear
Threshold model, Stochastic Threshold model, and Independent Cascading
model
β’ Introduce a new metric called inο¬uence relay to analyze the inο¬uence diffusion
process in both- a single network and multiplex networks
Proposed solution
15. Graph Notations
β’ Gi β Weighted directed graph consisting of (Vi, Ei, ΞΈi, Wi).
β’ Vi β Set of vertices in graph Gi, represents users in the network.
β’ Ei β Set of edges in graph Gi, which represent the connection between the
users.
β’ Wi β Set of weights of the edges which belong to Ei, which represents
the strength of influence or the strength of connection.
β’ Nu
i- , Nu
i+ β Set of incoming and outgoing neighbors of u.
β’ ΞΈi(u) β Threshold indicating the persistence of opinions of u.
16. Least Cost Influence (LCI) Problem definition
β’ Given:
β’ System of k networks G1..k
β’ Set of users U
β’ Time hop d
β’ 0<Ξ²<1
β’ To find:
β’ A seed set S β U of minimum cardinality to such that
β’ There are at least Ξ² fraction of users U active
β’ After d hops
17.
18.
19. Linear Threshold model
β’ Influence and information diffusion model for single network
β’ Could be extended to handle multiple networks
β’ In LT model:
β’ Every user is either active or inactive
β’ A user u is active if he/she accepts the information OR
β’ The total influence of their neighbors is greater than their threshold.
β’ After each time hop, inactive users are activated and they continue to activate new
users.
β’ d be the number of hops in the network till which information is propagated.
β’ Active set of users after d hops caused by seed set S is denoted by Ad(G1...k, S)
20. Coupling Schemes
β’ Lossless coupling scheme:
β’ Scheme to combine multiple networks into single network.
β’ No loss of data while combining networks. (Obviously!)
β’ Advantages:
β’ Use existing algorithms
β’ Same quality of solution
21. Challenges
β’ Heterogeneity of user participation:
β’ User might have joined a single network
β’ Other user might have joined multiple networks
β’ Recognition of users is difficult
β’ Inter-network Influence propagation
β’ User transmits the information in multiple networks
β’ Represent transmission of influence between networks in a single network.
β’ Preserving properties of individual networks
β’ Coupled network should preserve diffusion properties of individual networks.
β’ Should be able to establish relationship between solution for coupled network and
individual network
22.
23. Coupling scheme for LT-model
β’ Solution to 1st challenge
β’ Introduce dummy nodes.
β’ They represent a user u in the network Gi, in which the user is not registered.
β’ Solution to 2nd challenge
β’ Introduce gateway vertices.
β’ Introduce Synchronization edges.
β’ Instead of an edge between two vertices, there exist
β’ An edge between a user to a gateway vertex
β’ And an edge from gateway vertex to a user
β’ Solution to 3rd challenge
β’ Donβt need to do anything else.
24.
25.
26.
27. Lemmas
β’ Lemma 1: Suppose that the propagation process in the coupled network G
starts from the seed set which contains only gateway vertices S = {s0
1, . . . , s0
p},
then representative vertices are activated only at even propagation hops.
β’ Lemma 2: Suppose that the propagation process on G1...k and G starts from the
same seed set S, then following conditions are equivalent:
β’ User u is active after d propagation hops in G1...k.
β’ There exists i such that ui is active after 2d β 1 propagation hops in G.
β’ Vertex u0 is active after 2d propagation hops in G.
28. Theorems
β’ Theorem 1: Given a system of k networks G1...k with the user set U, the coupled
network G produced by the lossless coupling scheme, and a seed set S = {s1, s2,
. . . , sp}, if Ad(G1...k, S) = {a1, a2, . . . , aq} is the set of active users caused by S
after d propagation hops in multiplex networks, then A2d(G, S)= {a0
1, a1
1, . . . ,
ak
1, . . ., a0
q, a1
q, . . . , ak
q} is the set of active vertices caused by S after 2d
propagation hops in the coupled network.
β’ Theorem 2: When the lossless scheme is used, the set S = {s1, s2, . . . , sp}
influences Ξ² fraction of users in G1...k after d propagation hops if and only if S =
{s0
1, s0
2, . . . , s0
p} influences Ξ² fraction of vertices in coupled network G after 2d
propagation hops.
29. Extension to other diffusion models
β’ Lossless coupling scheme can be used for other diffusion models.
β’ Stochastic Threshold model
β’ Independent Cascading model
β’ Similarity between LT model and other approaches
β’ Same approach of using
β’ Gateway vertices
β’ Representative vertices
β’ Synchronization edges
30. Lossy Coupling
MOTIVATION
β’ In the coupled network of Lossless Coupling which was shown, there were a
large number of extra vertices and edges.
β’ It is ideal to have a compact coupled network which contains only users as
vertices.
β’ Such a compact coupled network will inevitably have loss of information.
31. Lossy Coupling
GOALS
β’ The goal is to design a scheme which will minimize this loss of information.
β’ The solution for finding the Least Cost Influence in the compact coupled
network should be very close to the solution in the original multiplex network.
35. Lossy Coupling
OBSERVATION 2
β’ When π’ participates in multiple networks, it may be easier to influence π’ in
some networks, than in others.
β’ For example if a node π’ is in two networks:
Network 1: π1(π’) = 0.1, π’ has 8 in-neighbors and each in-neighbor π£ influences π’
with π€1(π£, π’) = 0.1, it takes 1 neighbor to activate π’.
Network 2: π2(π’) = 0.7, π’ has 8 in-neighbors and each in-neighbor π£ influences π’
with π€2(π£, π’) = 0.1, it takes 7 neighbors to activate π’.
36. Lossy Coupling
EASINESS
β’ Intuitively we can say that π’ is easier to influence in Network 1.
β’ Formally, πππ ππππ π π π(π’) =
π£βπ π’
πβ π€ π(π£,π’)
π π(π’)
β’ We can use πππ ππππ π π π(π’) as πΌ π π’ for the equation stated in OBSERVATION
1.
37. Lossy Coupling
β’ Vertex Set is the set of users π = {π’1,β¦π’ π}
β’ The threshold of vertex π’ is π π’ = π=1
π
π π(π’)π π(π’)
β’ The weight of edge (π£, π’) is π€ π£, π’ = π=1
π
π π(π’)π€ π(π£, π’) where π€ π π£, π’ =
0 if there is no edge from π£ to π’ in the network π
38. Lossy Coupling
For the blue node,
π π’ = π=1
π
π π
(π’)π π
(π’) =>
0.2+0.1
0.2
β 0.2 +
0.5
0.5
β 0.5 = 0.8
For the edge between red node and blue node,
π€ π£, π’ = π=1
π
π π π’ π€ π π£, π’ =>
0.2+0.1
0.2
β 0.2 + 0 = 0.3
39. Lossy Coupling
INVOLVEMENT
β’ If a user is surrounded by a group of friends who have a high influence on each
other, the user tends to get influenced.
β’ We estimate πππ£πππ£πππππ‘ of a node π£ in a network πΊ πby measuring how
strongly the 1-hop neighborhood π£ is connected and to what extent influence
can propagate from one node to another in a 1-hop neighborhood.
40. Lossy Coupling
β’ Formally, πππ£πππ£πππππ‘ of a node π£ in a network πΊ π is defined as ππ£
π
=
π₯,π¦βπ π
π
βͺ{π£}
π€ π π₯,π¦
π π¦
π where π π
π
= π π
π+
βͺ π π
πβ
AVERAGE
β’ All parameters have same value i.e. πΌ π π’ = 1
41. Lossy Coupling
THEOREM 3
β’ When a lossy coupling scheme is used, if the set of users π activates π½ fraction
of users in πΊ (lossy coupled network), then it activates at least π½ fraction of
users in πΊ1..π (original system).
β’ The proof is based on the fact that the active state of a user in πΊ implies an
active state of users in πΊ1..π
.
42.
43. Influence Relay
MOTIVATION
β’ When information is diffused in multiplex networks, it may flow within a single
network or may travel through multiplex networks.
β’ What is the contribution of each component network in the influence process?
β’ How much information flows within a network or between networks?
β’ Quantifying these values will help us understand the diffusion process in
multiplex networks.
44. Influence Relay
DEFINITION
β’ The authors proposed πππππ’ππππ πππππ¦ as a metric to quantify the role of
users in propagating information.
β’ The πππππ’ππππ πππππ¦ of vertices is recursively defined depending on order of
activation.
β’ π = seed set, πΊ= coupled network, π = number of hops after which the
activation process stops, β π’ = hop at which u is activated.
β’ All inactive vertices in ππ΄ π(πΊ, π) have an πππππ’ππππ πππππ¦ of 0.
47. Influence Relay
COMPUTING INFLUENCE RELAY
β’ We compute πππππ’ππππ πππππ¦ of vertices in reverse order of the diffusion
process.
β’ We construct the influence graph πΌπΊπ = (ππ, πΈπ) from the seed set π to
represent the diffusion process and to calculate the πππππ’ππππ πππππ¦ of all
nodes in ππ.
β’ The vertex set ππ of π π nodes is π΄ π(πΊ, π).
β’ There is an edge from π’ to π£ in πΈπ if π’ has passed information to π£ i.e. π’, π£ β
π΄ π
(πΊ, π) and β π£ > β(π’).
β’ πΌπΊπ is a directed acyclic graph and the reverse topological ordering of πΌπΊπ takes
linear time. The main loop runs for all the edges in πΌπΊπ so πππππ’ππππ πππππ¦ of
all vertices can be computed in linear time.
48. Input: A network πΊ, a seed set π and the number of hops π.
Output: The influence relay πΌπ of all vertices.
πΌπΊπ β The influence graph caused by π on πΊ
for each π’ β π΄ π
(πΊ, π) do
πΌπ (π’) β 0
end for
Compute the topological ordering π’1, π’2, β¦ , π’ π π
of vertices in ππ
for π = π π down to 1 do
πΌπ (π’π) β πΌπ (π’π) + 1
total β 0
for each π£ β π π’
β
do
total β total + π€(π£, π’π)
end for
for each π£ β π π’
β
do
πΌπ (π£) β πΌπ (π£) +
π€(π£,π’ π)πΌπ (π’ π)
π‘ππ‘ππ
end for
end for
Return IR
49. Influence Relay
THEOREM 4
β’ One of the important properties of πππππ’ππππ πππππ¦ is that it preserves the
number of activated vertices.
β’ The total πππππ’ππππ πππππ¦ of seeding vertices is equal to the total number of
activated vertices.
π’βπ πΌπ π’ = |π΄ π(πΊ, π)|
50. Influence Relay
INFLUENCE CONTRIBUTION
β’ To obtain the contribution of a network to the diffusion process, we sum up
πππππ’ππππ πππππ¦ of all seed vertices in that network.
INTERNAL AND EXTERNAL INFLUENCE
β’ This can be used to quantify the amount of information flowing within and
between networks.
51. Influence Relay
β’ When the information is propagated within a component network called the
βtargetβ network there are two kinds of influence paths:
β’ πΌππ‘πππππ πππ‘βπ include edges only in the target network.
β’ πΈπ₯π‘πππππ πππ‘βπ include some edges of other networks. They are formed
when some of the vertices are activated outside the target network.
β’ We adapt relay influence to measure internal influence (passes through
internal paths) and external influence (passes through external paths) of the
seed set in the target network as follows:
52. Influence Relay
β’ Each vertex π’ has internal influence πΌπ ππ π’ and external influence πΌπ ππ₯ π’ .
β’ Both values are calculated backwards from activated vertices under π’βs
influence.
β’ Only activated vertex π’ in the target network receives 1 more influence unit to
πΌπ ππ π’ since we only consider the influence propagation in the target
network.
β’ If a vertex is activated outside the target network, all internal influence is
converted to external influence.
56. Real Networks
β’ Experiments performed on 2 data sets :
β’ Foursquare (FSQ) and Twitter networks
β’ Co-author networks in the area of Condensed Matter(CM), High-Energy
Theory(Het), and Network Science(NetS)
β’ Number of overlapping users in first dataset FSQ-Twitter is 4100.
β’ For second dataset, the numbers of overlapping users of the network pairs CM-Het,
CM-NetS, and Het-NetS are 2860, 517, and 90, respectively.
57. Real Networks
Weights of edges are
randomly assigned
from 0 to 1.
The edge weights are
then normalized so
that the total weight
of incoming degree of
each node is 1.
Threshold of each
node is a random
value from 0 to 1.
58. Synthesized Networks
β’ Synthesized networks generated by Erdos-Renyi random network model are
used for testing networks with controlled parameters.
β’ Two networks with 10000 nodes are formed by randomly connecting each pair
of nodes with probabilities 0.0008 and 0.006.
β’ The average degrees of the two networks are 8 and 60.
59. Comparison of coupling schemes
Solution Quality
β’ In both networks the seed size is smallest when the lossless coupling scheme
is used.
β’ The seed sizes are only a bit larger using the lossy coupling schemes.
60. Comparison of coupling schemes
β’ The small seed size is obtained through two different means:
β’ Increasing the fraction of overlapping users.
β’ Increasing the number of propagation hops.
61. Comparison of coupling schemes
Running Time
β’ The greedy algorithm runs much faster in the lossy coupled networks than in the
lossless coupled networks.
β’ Using the lossy coupled networks reduces the running times by a factor of 2 in FSQ-
Twitter and a factor 4 in the co-author networks in comparison to using the lossless
coupled networks.
β’ The major disadvantages of the lossless coupling scheme are the doubled number
of hops and the number of extra nodes and edges.
62. Advantages of using coupled networks
Influencing a fraction Ξ² of the nodes in all networks:
β’ The results using our lossless coupling method outperform the results when
we run the greedy algorithm on each network separately and take the union
of the produced seed sets.
β’ In Co-author networks, the size of seed set is 30% larger, and in FSQ-Twitter,
it is 47% larger than the size of seed sets using lossless coupling method.
63. Influencing a fraction Ξ² of the nodes in a particular network:
β’ The seed size decreases up to 9%, 25%, 17%, and 26% in CM, Het, FSQ, and Twitter,
respectively, when we consider these networks in connection with other networks.
β’ The external influence is substantial and accounts for large portions in many cases.
For instance, when the influenced fraction Ξ² = 0.2, the external influence accounts
for 27.3%, 52.7%, and 30.0% the total influence in CM, Het, and NetS, respectively.
64. Analysis of seed sets
β’ A significant fraction of the seed set is overlapping nodes although only 5%-7% users
of any network are overlapping users.
β’ For Ξ² = 0.4, the fraction of overlapping seed vertices is around 24.9% and 25% in the
co-author and FSQ-Twitter networks, respectively.
β’ When Ξ² is small, there is high influence contribution of overlapping users(approx. 50%
when Ξ² = 0.2). However when Ξ² is large, overlapping users are already selected so
they are not favored.
65. Mutual Impact of networks
β’ When k increases from 2 to 5, the seed size decreases several times. It implies that the
introduction of a new OSN increases the diffusion of information significantly.
β’ The number of influenced vertices is raised 46% with the support of 3 new networks
when k is changed from 2 to 5.
β’ the fraction of external influence is also increased dramatically from 39% when k = 2
to 67% when k = 5.
β’ All these results suggest that the existing networks may benefit from the newly
introduced competitor.
66. Conclusion and future Work
β’ To tackle the LCI problem, novel coupling schemes are introduced to reduce
the problem to a version on a single network.
β’ A new metric is designed to quantify the ο¬ow of inο¬uence inside and between
networks based on the coupled network.
β’ Exhaustive experiments provide new insights to the information diffusion in
multiplex networks.
β’ In future, the LCI problem can be investigated in multiplex networks with
heterogeneous diffusion models in which each network may have its own
diffusion mode.