Efficient and Effective Influence Maximization in Social Networks: Hybrid Approach

Febrary 21, 2018 Page 1/22
Efficient and Effective Influence Maximization
in Social Networks: A Hybrid Approach
2018. 02. 21
Yun-Yong Ko
BigData science lab
Department of Computer and Software
Hanyang University

Table of Contents
• Problem definition
• Preliminary
– Diffusion model
• Related works
• Hybrid-IM
– Path-based community detection
– G-CELF algorithm
• Experimental results

Problem definition
• Influence Maximization (IM)
– To find a k-seed set that maximizes influence spread in a given network
𝑆 = 𝑎𝑟𝑔𝑚𝑎𝑥 𝑆⊂𝑉, 𝑆 =𝑘σ(S)
– Network
✓ Node: user
✓ Edge: the relationship between users
– Type of a node
✓ Active user: user who buys the product
✓ Inactive user: user who doesn’t buy the product
– The selected nodes (seed set)
✓ Active nodes in the initial stage of influence propagation
▪ User group that receives samples from a company

Preliminary
• Diffusion model
– To describes how influence spreads over the network.
✓ Linear threshold (LT) model
✓ Independent cascade (IC) model
• Common assumptions
– (1) Nodes can have either of two states, active or inactive.
– (2) As time goes by, inactive nodes can be activated, but active nodes
cannot become inactive.
– (3) The diffusion process is finished if any nodes do not become active
state.

Linear threshold (LT) model
Inactive node
Active node
Threshold
Active neighbors
v
0.5
0.3
0.2
0.5
0.1
0.4
0.3
0.2
0.6
0.2
Stop!
X

Independent cascade (IC) model
v
0.5
0.3
0.2
0.5
0.1
0.4
0.3
0.2
0.6
0.2
Inactive node
Active node
New active
node
Successful
attempt
Unsuccessful
attempt
Stop!
X

Related works - Greedy approach
• Optimal solution to IM problem
– Finding the optimal solution to IM is NP-Hard
✓ There are possible 𝑛 𝐶 𝑘 k-seed sets
• Greedy approach (SimpleGreedy)
– To select a node having the maximum marginal gain at each step
✓ The marginal gain of node v
▪ 𝜎 𝑆 + 𝑣 − 𝜎(𝑆)
▪ The influence spread obtained by the node additionally
– SimpleGreedy guarantees to find an approximate solution that provides
63% of the quality of the optimal solution
✓ Considered as the ground truth in the IM field

Performance issues of SimpleGreedy
• Macro level
– When a new seed selected in a step,
– It have to re-evaluate the marginal gain of all the non-seed nodes
✓ Their marginal gain is likely to have been changed by the new seed
▪ 𝜎 𝑆𝑡 + 𝑣 − 𝜎 𝑆𝑡 ≠ 𝜎 𝑆𝑡+1 + 𝑣 − 𝜎(𝑆𝑡+1)
• Micro level
– It evaluates the marginal gain of a node by running MC-simulations
✓ 𝝈 𝑺 + 𝒗 − 𝜎(𝑆)
✓ Running MC-simulations is very time-consuming

Related works - Community-based IM (CB-IM)
• Purpose
– To resolve the macro issue by exploiting the property of communities
• The property of communities in a social network
– Users belonging to the same community
✓ Exchange information frequently
– Users belonging to different communities
✓ Exchange information rarely

Gain of CB-IM
• Exploiting the property of communities
– The difference between the influence spread of a node within a
community and that on the whole network is insignificant
𝜎𝑖𝑛𝑡𝑟𝑎({𝑣}) ≈ 𝜎 𝑤ℎ𝑜𝑙𝑒({𝑣})
• After a new seed from a community selected,
– Only those nodes in same community need to be re-evaluated

Related works - Path-based IM (PB-IM)
• Purpose
– To resolve the micro issue by replacing MC-simulations
• Method
– To evaluate the influence spread of a node
✓ Aggregating the weights of all paths from the node
▪ Rather than running MC-simulations
𝜎 𝑣 = 1 + ෍
𝑢⊂𝑂 𝑣
𝜎 𝑢
(𝑣)
▪ 𝜎 𝑢
(𝑣): influence from node v to node u
𝑊 𝑝 = ෑ
𝑖=1
𝑚−1
𝑤(𝑣𝑖, 𝑣𝑖+1)

Path pruning
• Purpose
– To estimate the influence spread more efficiently in PB-IM
✓ Finding all possible paths is #P-hard problem
• Method
– Only consider paths whose weights are larger than the threshold

Hybrid-IM
• Purpose
– To resolve both the micro and the macro issues of SimpleGreedy
• Proposed method
– To combine PB-IM and CB-IM for addressing both of the two issues
✓ Community detection stage
▪ To reduce the number of nodes to be re-evaluated applying CB-IM
✓ Seed selection stage
▪ To evaluate the marginal gain of a node by PB-IM quickly
– To address additional technical issues
✓ The existing community detection method
✓ The existing CELF algorithm

The existing community detection method
• Background
– Existing community detection method does not consider influence
propagation between nodes
– To improve the performance, CB-IM exploits the property of the
community structure
✓ If a considerable time is required in the community detection stage, it is not
meaningful in improving the overall performance of CB-IM
• Intuition
– Detecting communities considering the influence propagation between
communities through only live edges
✓ live edge: its weight is greater than a pre-defined threshold

The existing community detection method
• The problems of the existing method
– A large number of actual edges are ignored
✓ More than 90% of the edges were removed
– The weights of live edges are all ignored
✓ All live edges are treated identically although they could be quite different
0
1
2
3
4
5
6
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
Thenumberofedges
x1000
Edge weight

Strategy 1: PB-CD
• Path based community detection (PB-CD)
– To apply path-based influence estimation for estimating the overflowed
influence between communities
✓ Relies on much more edges of the original graph and their weight
✓ Rather than only live edges
• Two sub-steps
– Unit-community detection (UCD)
✓ To assign community label to each node,
▪ Based on its affinity to each neighboring community
– Community merge (CM)
✓ To merge communities by considering the overflowed influence between the
communities after UCD step

The existing CELF algorithm
• Cost effective lazy forward (CELF)
– To reduce the number of nodes to be re-evaluated (macro issue)
✓ Exploiting the submodularity of influence function
✓ σ 𝑆 + 𝑣 − σ 𝑆 ≥ σ 𝑇 + 𝑣 − σ 𝑇 , 𝑆 ⊂ 𝑇
• Example
– After node a was selected at step t,
✓ The marginal gain of node b was re-evaluated by 19
– The below nodes cannot be the next seed
✓ Because of submodularity
19
Re-evaluated

The existing CELF algorithm
• The existing CELF in CB-IM
– To assign a local CLEF queue for each community
– To be applied to local queue independently

Strategy 2: G-CELF
• Global CELF (G-CELF)
– To assign a single global queue
– To reduce the number of nodes to be re-evaluated more
• Additional information
– Node
✓ Community label
✓ Flag for the re-evaluation process
– Community
✓ Flag for the re-evaluation process Hybrid-IM

Overall process of Hybrid-IM

• Diffusion model
– Independent cascade (IC) model
✓ The weight of edge (u, v) = 1/𝑖𝑑𝑒𝑔𝑟𝑒𝑒 𝑣
✓ 𝑖𝑑𝑒𝑔𝑟𝑒𝑒 𝑣 : the number of in-coming edges of node v
• Dataset
Experimental setup
Dataset NetHEPT NetPHY Stanford DBLP
# of Nodes 15K 37K 281K 655K
# of Edges 58K 231K 2.31M 3.98M
Avg. Degree 7.7 12.4 8.2 6.1
Max Degree 341 286 38606 588
Direction Undirected Undirected Directed Undirected

Experiments for PB-CD
• Methods
– To show the effectiveness each part (UCD, CM) of PB-CD,
✓ Building four methods of community detection by employing all possible
combinations
UCD/ CM Live based Path based
Live based LL_CD LP_CD
Path based PL_CD PP_CD

The result of PB-CD (Influence spread)
• PP_CD (PB-CD) outperforms all other methods for all datasets
3
3.5
4
4.5
5
5.5
6
LL_CD PL_CD LP_CD PP_CD
InfluenceSpread(x1000)
7.2
7.3
7.4
7.5
7.6
7.7
7.8
7.9
8
8.1
5.4
5.6
5.8
6
6.2
6.4
6.6
6.8
7
4
4.2
4.4
4.6
4.8
5
5.2
5.4
5.6
5.8

The result of PB-CD (Running time)
• The community detection time takes a small part in the total time
0
0.5
1
1.5
2
2.5
RunningTime(sec)
0
1
2
3
4
5
6
7
8
9
RunningTime(sec)
0
10
20
30
40
50
60
70
RunningTime(sec)
0
10
20
30
40
50
60
RunningTime(sec)

The result of G-CELF
• G-CELF outperforms the existing one by 21.1%, 26.1%, 2.1%, 47.1%
0
1
2
3
4
5
6
7
8
9
10
CB-IM's CELF G-CELF
RunningTime(sec)
0
5
10
15
20
25
30
35
CB-IM's CELF G-CELF
RunningTime(sec)
0
200
400
600
800
1000
1200
1400
CB-IM's CELF G-CELF
RunningTime(sec)
0
50
100
150
200
250
CB-IM's CELF G-CELF
RunningTime(sec)

Experiments for Hybrid-IM
• Methods
– Random
✓ Baseline
– SDD (single degree discount)
✓ To selects a node having the highest degree;
✓ After a seed is selected, the degree of all its neighbors is decreased by 1
– CB-IM, PB-IM
✓ Explained in previous slides
– Hybrid-IM
✓ Our proposed method

The running time of each method
0.001
0.01
0.1
1
10
100
1000
1 100 200 300 400 500 600 700 800 900 1000
RunningTime(sec)
The number of seeds
Hybrid-IM PB-IM CB-IM
SDD Random
0.001
0.01
0.1
1
10
100
1000
10000
1 100 200 300 400 500 600 700 800 900 1000
RunningTime(sec)
The number of seeds
SDD Random
0.01
0.1
1
10
100
1000
10000
1 100 200 300 400 500 600 700 800 900 1000
RunningTime(sec)
The number of seeds
Hybrid-IM PB-IM
SDD Random
0.1
1
10
100
1000
10000
1 100 200 300 400 500 600 700 800 900 1000
RunningTime(sec)
The number of seeds
Hybrid-IM PB-IM
SDD Random

The influence spread of each method
0
1
2
3
4
5
6
1 100 200 300 400 500 600 700 800 900 1000
The number of seeds
SDD Random
0
1
2
3
4
5
6
7
8
9
1 100 200 300 400 500 600 700 800 900 1000
The number of seeds
SDD Random
0
1
2
3
4
5
6
7
8
1 100 200 300 400 500 600 700 800 900 1000
The number of seeds
Hybrid-IM PB-IM
SDD Random
0
1
2
3
4
5
6
1 100 200 300 400 500 600 700 800 900 1000
The number of seeds
Hybrid-IM PB-IM
SDD Random

Conclusions
• We propose Hybrid-IM that combines PB-IM and CB-IM
– In order to resolve the micro and macro level issues together in the
problem of influence maximization
• To refine it more, we identified two additional issues and proposed
two strategies that address the issues
– PB-CD strategy
✓ To consider influence propagation more accurately in community detection
– G-CELF strategy
✓ Further optimizes seed selections without any sacrifice of accuracy

Q & A

Efficient and Effective Influence Maximization in Social Networks: Hybrid Approach

Recommended

Recommended

More Related Content

Similar to Efficient and Effective Influence Maximization in Social Networks: Hybrid Approach

Similar to Efficient and Effective Influence Maximization in Social Networks: Hybrid Approach (20)

More from NAVER Engineering

More from NAVER Engineering (20)

Recently uploaded

Recently uploaded (20)

Efficient and Effective Influence Maximization in Social Networks: Hybrid Approach