SlideShare a Scribd company logo
1 of 56
Part I → Part II → Part III → Part IV → Part V
           Influence Maximization




1
Word-of-mouth (WoM) effect in social
    networks             xphone is good

                        xphone is good


                                                    xphone is good
                               xphone is good


                                                              xphone is good
    xphone is good
                                          xphone is good



       Word-of-mouth (viral) marketing is believed to be a promising
        marketing strategy.
       Increasing popularity of online social networks may enable large
        scale viral marketing
2
Outline
       Diffusion/propagation models and the
        Influence Maximization (IM) problem

       The theory: greedy methods to optimize
        submodular functions

       Scalable influence maximization



3
Diffusion/Propagation Models
    and the Influence Maximization
              (IM) Problem




4
The first definition of IM problem: A
    Markov random fields formulation
       Each node 𝑖 has random variable 𝑋 𝑖 ,
        indicating bought the product or not,
         𝑿 = 𝑋1 , … , 𝑋 𝑛

       Markov random field formation: 𝑋 𝑖 depends
        on its neighbors’ actions 𝑵 𝑖 ⊆ 𝑿

       Marketing action 𝑴 = {𝑀1 , … , 𝑀 𝑛 }

       Problem: find a choice of 𝑴 that maximizes
        the revenue obtained from the result of 𝑿
5                               [Domingos & Richardson KDD 2001, 2002]
Discrete diffusion models
       Static social network: 𝐺 = (𝑉, 𝐸)
        ◦ 𝑉: set of nodes, representing individuals
        ◦ 𝐸: set of edges, representing directed influence
          relationships

       Influence diffusion process
        ◦ Seed set 𝑺: initial set of nodes selected to start the
          diffusion
        ◦ Node activations: Nodes are activated starting from
          the seed nodes, in discrete steps and following
          certain stochastic models
        ◦ Influence spread 𝝈(𝑺): expected number of
          activated nodes when the diffusion process starting
          from the seed set 𝑆 ends
6
Major stochastic diffusion models
       Independent cascade (IC) model

       Linear threshold (LT) model

       General threshold model

       Others
        ◦ Voter model
        ◦ Heat diffusion model


7
Independent cascade model
    •   Each edge (𝑢, 𝑣) has
        a propagation
        probability 𝑝(𝑢, 𝑣)
    •   Initially some seed
        nodes are activated
    •   At each step 𝑡, each
                                             0.3
        node 𝑢 activated at
                                             0.1
        step 𝑡 − 1 activates
        its neighbor 𝑣 with
        probability 𝑝(𝑢, 𝑣)
    •   once activated, stay
        activated
8                              [Kempe, Kleinberg and Tardos, KDD 2003]
Linear threshold model
       Each edge (𝑢, 𝑣) has weight                             0.6
                                                                                   0.5
         𝑤 𝑢, 𝑣 :
        ◦ when      𝑢, 𝑣 ∉ 𝐸, 𝑤 𝑢, 𝑣 = 0
        ◦   𝑢   𝑤 𝑢, 𝑣 ≤ 1
       Each node 𝑣 selects a
        threshold 𝜃 𝑣 ∈ [0,1] uniformly            0.3                                   0.3
        at random
       Initially some seed nodes
        are activated                      0.4                        0.3
       At each step, node                                            0.1    0.7
         𝑣 checks if the weighted
        sum of its active neighbors is
        greater than its threshold 𝜃 𝑣 ,
        if so 𝑣 is activated                              0.2                            0.8
                                                 0.3
       once activated, stay
        activated

9                                                  [Kempe, Kleinberg and Tardos, KDD 2003]
Influence maximization
        Given a social network, a diffusion model
         with given parameters, and a number 𝑘,
         find a seed set 𝑆 of at most 𝑘 nodes such
         that the influence spread of 𝑆 is maximized.

        May be further generalized:
         ◦ Instead of k, given a budget constraint and
           each node has a cost of being selected as a
           seed
         ◦ Instead of maximizing influence spread,
           maximizing a (submodular) function of the set of
           activated nodes
10
Key takeaways for diffusion models
     and IM definition
        stochastic diffusion models
         ◦ IC model reflects simple contagion
         ◦ LT model reflects complex contagion (activation
           needs social affirmation from multiple sources
           [Centola and Macy, AJS 2007])

        maximization objective focuses on expected
         influence spread
         ◦ others to be considered later



11
Time for
       Theory:
      Optimizing
     Submodular
      Functions
                   Credit: Rico3244 @ DeviantArt




12
Hardness of influence maximization
        Influence maximization under both IC
         and LT models are NP hard
         ◦ IC model: reduced from k-max cover
           problem
         ◦ LT model: reduced from vertex cover
           problem

        Need approximation algorithms


13
Optimizing submodular functions
        Sumodularity of set
         functions 𝑓: 2V → 𝑅
         ◦ for all 𝑆 ⊆ 𝑇 ⊆ 𝑉, all 𝑣 ∈ 𝑉 ∖ 𝑇,    𝑓(𝑆)
           𝑓 𝑆∪ 𝑣   − 𝑓 𝑆 ≥ 𝑓 𝑇∪ 𝑣     − 𝑓(𝑇)
         ◦ diminishing marginal return
         ◦ an equivalent form: for all
           𝑆, 𝑇 ⊆ 𝑉
                                                       |𝑆|
           𝑓 𝑆∪ 𝑇 + 𝑓 𝑆∩ 𝑇 ≤ 𝑓 𝑆 + 𝑓 𝑇

        Monotonicity of set
         functions 𝑓: for all 𝑆 ⊆ 𝑇 ⊆ 𝑉,
                    𝑓 𝑆 ≤ 𝑓(𝑇)
14
Example of a submodular function
     and its maximization problem
        set coverage
         ◦ each entry 𝑢 is a subset of
                                                       elements
           some base elements                  sets
         ◦ coverage 𝑓 𝑆 = | 𝑢∈𝑆 𝑢 |
         ◦ 𝑓 𝑆 ∪ 𝑣 − 𝑓 𝑆 : additional              𝑆
           coverage of v on top of 𝑆
                                               𝑇
        𝑘-max cover problem
         ◦ find 𝑘 subsets that maximizes   𝑣
           their total coverage
         ◦ NP-hard
         ◦ special case of IM problem in
           IC model
15
Greedy algorithm for submodular
     function maximization
      1: initialize 𝑆 = ∅ ;
      2: for 𝑖 = 1 to 𝑘 do
      3:     select
              𝑢 = argmax 𝑤∈𝑉∖𝑆 [𝑓 𝑆 ∪ 𝑤   − 𝑓(𝑆))]
      4:    𝑆 = 𝑆 ∪ {𝑢}
      5: end for
      6: output 𝑆



16
Property of the greedy algorithm
        Theorem: If the set function 𝑓 is
         monotone and submodular with 𝑓 ∅ =
         0, then the greedy algorithm achieves
         (1 − 1/𝑒) approximation ratio, that is, the
         solution 𝑆 found by the greedy algorithm
         satisfies:
                       1
         ◦ 𝑓 𝑆 ≥ 1−        max 𝑆 ′⊆𝑉,   𝑆′   =𝑘   𝑓(𝑆 ′ )
                       𝑒




17               [Nemhauser, Wolsey and Fisher, Mathematical Programming, 1978]
Submodularity of influence
     diffusion models
        Based on equivalent live-edge graphs



                  0.3

                  0.1




          diffusion dynamic            random live-edge graph:
                                       edges are randomly removed
         Pr(set A is activated given   Pr(set A is reachable from S in
         seed set S)                   random live-ledge graph)
18
(Recall) active node set via IC
     diffusion process
     •   yellow node set is
         the active node
         set after the
         diffusion process in
         the independent
         cascade model          0.3

                                0.1




19
Random live-edge graph for the IC
     model and its reachable node set
        random live-edge
         graph in the IC model
         ◦ each edge is
           independently
           selected as live with its
           propagation
           probability

        yellow node set is the        0.3
         active node set
         reachable from the            0.1
         seed set in a random
         live-edge graph

        Equivalence is
         straightforward

20
(Recall) active node set via LT
     diffusion process
                                                                  0.5
        yellow node set                        0.6


         is the active
         node set after                                                 0.3
                                    0.3
         the diffusion
         process in the     0.4                       0.3

         linear threshold                             0.1   0.7

         model
                                          0.2                           0.8
                                  0.3




21
Random live-edge graph for the LT
     model and its reachable node set
        random live-edge
         graph in the LT model
         ◦ each node select at
           most one incoming
           edge, with probability
           proportional to its
           weight

        yellow node set is the
         active node set
         reachable from the         0.3
         seed set in a random
         live-edge graph            0.1

        equivalence is based
         on uniform threshold
         selection from [0,1],
         and linear weight
         addition


22
Submodularity of influence
     diffusion models (cont’d)
     •   Influence spread of seed set 𝑆, 𝜎(𝑆):
                     𝜎 𝑆 =    𝐺 𝐿 Pr   𝐺 𝐿 |𝑅 𝑆, 𝐺 𝐿 |,
         •    𝐺 𝐿 : a random live-edge graph
         •   Pr 𝐺 𝐿 : probability of 𝐺 𝐿 being generated
         •    𝑅(𝑆, 𝐺 𝐿 ): set of nodes reachable from 𝑆 in 𝐺 𝐿

     •   To prove that 𝜎 𝑆 is submodular, only need to
         show that 𝑅 ⋅, 𝐺 𝐿 is submodular for any 𝐺 𝐿
         •   sumodularity is maintained through linear
             combinations with non-negative coefficients

23
Submodularity of influence
     diffusion models (cont’d)
     •   Submodularity of
          𝑅 ⋅, 𝐺 𝐿
         •   for any 𝑆 ⊆ 𝑇 ⊆ 𝑉,
              𝑣 ∈ 𝑉 ∖ 𝑇,
         •   if 𝑢 is reachable from 𝑣     𝑆         𝑇
             but not from 𝑇, then
         •    𝑢 is reachable from 𝑣 but                         𝑣
             not from 𝑆
         •   Hence, 𝑅 ⋅, 𝐺 𝐿 is                         𝑢
             submodular

     •   Therefore, influence                 marginal contribution
         spread 𝜎 𝑆 is                            of 𝑣 w.r.t. 𝑇
         submodular in both IC
         and LT models
24
General threshold model
     •   Each node 𝑣 has a threshold function
                          𝑓𝑣 : 2 𝑉 → [0,1]
     •   Each node 𝑣 selects a threshold 𝜃 𝑣 ∈ [0,1]
         uniformly at random
     •   If the set of active nodes at the end of step
          𝑡 − 1 is 𝑆, and 𝑓𝑣 𝑆 ≥ 𝜃 𝑣 , 𝑣 is activated at step 𝑡
     •   reward function 𝑟(𝐴(𝑆)): if 𝐴(𝑆) is the final set of
         active nodes given seed set 𝑆, 𝑟(𝐴(𝑆)) is the
         reward from this set
     •   generalized influence spread:
                   𝜎 𝑆 = 𝐸[𝑟 𝐴 𝑆 ] [Kempe, Kleinberg and Tardos, KDD 2003]
25
IC and LT as special cases of
     general threshold model
     •   LT model
         •   𝑓𝑣 𝑆 = 𝑢∈𝑆 𝑤(𝑢, 𝑣)
         •   𝑟(𝑆) = |𝑆|

     •   IC model
         •   𝑓𝑣 𝑆 = 1 −   𝑢∈𝑆(1 −   𝑝 𝑢, 𝑣 )
         •   𝑟(𝑆) = |𝑆|




26
Submodularity in the general
     threshold model
        Theorem [Mossel & Roch STOC 2007]:
         ◦ In the general threshold model,
         ◦ if for every 𝑣 ∈ 𝑉, 𝑓𝑣 (⋅) is monotone and
           submodular with 𝑓𝑣 ∅ = 0,
         ◦ and the reward function 𝑟(⋅) is monotone
           and submodular,
         ◦ then the general influence spread function
            𝜎 ⋅ is monotone and submodular.

        Local submodularity implies global
         submodularity
27
The greedy approximation
     algorithm for the IM problem
                  1
     •       1−     −   𝜀 -approximation greedy algorithm
                  𝑒
         •   Use general greedy algorithm framework
         •   However, need evaluation of influence spread
             𝜎 𝑆 , which is shown to be #P-hard
         •   Use Monte Carlo simulations to estimate 𝜎 𝑆 to
             arbitrary accuracy with high probability
                                                           1
         •   For any 𝜀 > 0, there exists 𝛾 > 0, s.t. 1   −     − 𝜀 -
                                                           𝑒
             approximation can be achieved with 1 − 𝛾
             approximate values of 𝜎 𝑆

28
Performance of greedy algorithm
          weighted cascade model                     linear threshold model




 • coauthorship graph from arxiv.org, high energy physics section,
   n=10748, m53000
 • allow parallel edges, 𝑐 𝑢,𝑣 = number of edges between u and v
                                           1 𝑐 𝑢,𝑣
 • weighted cascade: 𝑝 𝑢, 𝑣 = 1 − 1 −      𝑑𝑣
 • linear threshold: 𝑤 𝑢, 𝑣 = 𝑐 𝑢,𝑣 /𝑑 𝑣
29                                         [Kempe, Kleinberg and Tardos, KDD 2013]
Key takeaways for theory support
        submodularity of diffusion models
         ◦ diminishing return
         ◦ shared by many models, but some model
           extensions may not be submodular (see Part
           IV)

        submodular function maximization
         ◦ greedy hill-climbing algorithm
         ◦ approximation ratio (1 − 1/𝑒)

30
Scalable Influence Maximization




31
Theory versus Practice
                      ...the trouble about arguments is,
                      they ain't nothing but theories,
                      after all, and theories don't prove
                      nothing, they only give you a place
                      to rest on, a spell, when you are
                      tuckered out butting around and
                      around trying to find out something
                      there ain't no way to find out...
                      There's another trouble about
                      theories: there's always a hole in
                      them somewheres, sure, if you
                      look close enough.
                      - “Tom Sawyer Abroad”, Mark Twain




32
Inefficiency of greedy algorithm
        It take days to find 50 seeds for a 30K node
         graph

        Too many evaluations of influence spread
         ◦ Given a graph of 𝑛 nodes, each round needs
           evaluation of 𝑛 influence spreads, totally 𝑂(𝑛𝑘)
           evaluations

        Each evaluation of influence spread is hard
         ◦ Exact computation is #P-hard, for both IC and LT
           models [Chen et al. KDD/ICDM 2010]
         ◦ Monte-Carlo simulation is very slow
33
Scalable Influence Maximization
        Reduction on the number of influence
         spread evaluations.

        Batch computation of influence spread

        Scalable heuristics for influence spread
         computation



34
Lazy forward optimization
        Exploiting submodularity, significantly reduce # influence
         spread evaluations
         ◦ 𝑆 𝑡−1 − seed set selected after round 𝑡 − 1
         ◦ 𝑣 𝑡 − selected as seed in round 𝑡: 𝑆 𝑡 = 𝑆 𝑡−1 ∪ {𝑣 𝑡 }
         ◦ 𝑢 is not a seed yet, 𝑢’s marginal gain 𝑀𝐺 𝑢 𝑆 𝑡 = 𝜎(𝑆 𝑡 ∪ {𝑢}) −
            𝜎(𝑆 𝑡 )
         ◦ by submodularlity, 𝑀𝐺 𝑢 𝑆 𝑡 ≤ 𝑀𝐺(𝑢|𝑆 𝑡−1 )
         ◦ This implies, if 𝑀𝐺 𝑢 𝑆 𝑡−1 ≤ 𝑀𝐺(𝑣|𝑆 𝑡 ) for some node 𝑣, then no
           need to evaluate 𝑀𝐺 𝑢 𝑆 𝑡 in round 𝑡 + 1.
         ◦ Can be implemented efficiently using max-heap
            take the top of the heap, if it has 𝑀𝐺 for the current round, then it is the
             new seed;
            else compute its 𝑀𝐺, and re-heapify
        Often, top element after round 𝑘 − 1 remains top in round
         𝑘.
        Up to 700 X speedup
35                                                              [Leskovec et al., KDD 2007]
CELF++
        With each heap node 𝑢, further maintain
         ◦ 𝑢. 𝑚𝑔1 = 𝑀𝐺(𝑢|𝑆), 𝑆 = current seed set.
         ◦ 𝑢. 𝑝𝑟𝑒𝑣_𝑏𝑒𝑠𝑡 = node with max. MG in the current
           iteration, among nodes seen before 𝑢;
         ◦ 𝑢. 𝑚𝑔2 = 𝑀𝐺(𝑢|𝑆 ∪ {𝑢. 𝑝𝑟𝑒𝑣_𝑏𝑒𝑠𝑡});
         ◦ 𝑢. 𝑓𝑙𝑎𝑔 = iteration where 𝑢. 𝑚𝑔1 was last updated;

         𝑢. 𝑝𝑟𝑒𝑣_𝑏𝑒𝑠𝑡 chosen in current iteration  no
         need to compute 𝑀𝐺(𝑢|𝑆 ∪ {𝑢. 𝑝𝑟𝑒𝑣_𝑏𝑒𝑠𝑡}) in next
         iteration.
         𝑀𝐺(𝑢|𝑆 ∪ {𝑢. 𝑝𝑟𝑒𝑣_𝑏𝑒𝑠𝑡}) can be computed
         efficiently along with 𝑀𝐺(𝑢|𝑆) in one run of MC.
36                                           [Goyal, Lu and L., WWW 2011]
CELF++ (contd.)
     •   Pick the top node 𝑢 in the heap
     •   𝑢. 𝑓𝑙𝑎𝑔 = |𝑆|  𝑢. mg1 is correct 𝑀𝐺, thus 𝑢 is the best pick
         of current itn; add 𝑢 to 𝑆; remove from 𝑄;
     •   else, if 𝑢. 𝑝𝑟𝑒𝑣_𝑏𝑒𝑠𝑡 = 𝑙𝑎𝑠𝑡_𝑠𝑒𝑒𝑑 & 𝑢. 𝑓𝑙𝑎𝑔 = |𝑆| − 1
         𝑢. 𝑚𝑔1  𝑢. 𝑚𝑔2 (no need to compute 𝑀𝐺(𝑢|𝑆 +
         𝑙𝑎𝑠𝑡_𝑠𝑒𝑒𝑑))




37
Batch computation of
     influence spread
        Based on equivalence to reachable set in
         live-edge graphs
         ◦ Generate a random live-edge graph
         ◦ batch computation of size of reachable set of
           all nodes [Cohen, JCSS 1997]
         ◦ repeat above to increase accuracy

        In conflict with lazy-forward optimization
         ◦ MixedGreedy: In the first round uses reachability-
           based batch computation; in remaining rounds,
           use lazy forward optimization
38                                   [Chen, Wang and Yang, KDD 2009]
Scalable heuristics for
     influence spread computation
        MIA: for IC model [Chen, Wang and
         Wang, KDD 2010]
        LDAG: for LT model [Chen, Yuan and
         Zhang, ICDM 2010]
        Features:
         ◦ Focus on a local region
         ◦ Find a graph spanner allow efficient
           computation
         ◦ Batch update of marginal influence spread
39
Maximum Influence Arborescence
     (MIA) Heuristic
        Local influence
         regions of node 𝑣
         ◦ For every nodes 𝑢, find
           the maximum influence          𝑢
           path (MIP) from 𝑢 to 𝑣,
           ignore it if Pr(𝑝𝑎𝑡ℎ) < 𝜆 (𝜆
           is a small threshold value)
         ◦ all MIPs to 𝑣 form its             0.3
           maximum
           influence in-                      0.1
                                                    𝑣
           arborescence (MIIA)
         ◦ MIIA can be efficiently
           computed
         ◦ influence to 𝑣 is
           computed over MIIA

40
MIA Heuristic III: Computing
     Influence through the MIA
     structure
        Recursive computation of activation probability 𝑎𝑝(𝑢)
         of a node 𝑢 in its in-arborescence, given a seed set 𝑆
         (𝑁 𝑖𝑛 (𝑢) is the in-neighbors of 𝑢 in its in-arborescence)




41
MIA Heuristic IV: Efficient updates
     on incremental activation
     probabilities
                                                𝑀𝐼𝐼𝐴(𝑣)
        𝑢 is the new seed in 𝑀𝐼𝐼𝐴(𝑣)
        Naive update: for each candidate   𝑢
          𝑤, redo the computation in the
         previous page to compute 𝑤’s
         marginal influence to 𝑣                          𝑤
                         2
         ◦   𝑂( 𝑀𝐼𝐼𝐴 𝑣       )
        Fast update: based on linear
         relationship of activation
         probabilities between any node
          𝑤 and root 𝑣, update marginal
         influence of all 𝑤’s to 𝑣 in two           𝑣
         passes
         ◦   𝑂(|𝑀𝐼𝐼𝐴(𝑣)|)


42
Experiment Results on MIA heuristic
          Influence spread vs. seed set                            running time
          size                                                                103 times
                                                                              speed up




                                    close to Greedy,
                                 49% better than Degree,
                                    15% better than
                                    DegreeDiscount




      Experiment setup:
      •   35k nodes from coauthorship graph in physics archive
      •   influence probability to a node 𝑣 = 1 / (# of neighbors of 𝑣)
      •   running time is for selecting 50 seeds
43
Time for some simplicity




44
The SimPath Algorithm
        Vertex Cover                          Look ahead
        Optimization                          optimization
     Improves the efficiency                Improves the efficiency
       in the first iteration             in the subsequent iterations

           In lazy forward manner, in each
          iteration, add to the seed set, the
             node providing the maximum
                marginal gain in spread.
     [Goyal, Lu, & L. ICDM 2011]
                                     Simpath-Spread
45
                                   Compute marginal gain by
                                    enumerating simple paths
Estimating Spread in SimPath (2)
        Thus, the spread of a node can be
         computed by enumerating simple paths
         starting from the node.
                                        0.4                Influence of x on z
                        x                              z Influence of x on y
                                  0.3         0.2
                                                        Influence of x on x itself
                            0.1                     0.5
                                        y                  Total influence of
                                                           node x is 1.96

            Influence Spread of node x is
            = 1 + (0.3 + 0.4 * 0.5) + (0.4 + 0.3 * 0.2) = 1.96
46
Estimating Spread in SimPath (3)


                                                           Total influence of the
                                                           seed set {x, y} is 2.6
                                                           Influence of node y in a
                                          0.4              subgraph that does not
                          x                              z
                                                0.2
                                                           contain x
                                    0.3
                              0.1                     0.5
                                                           Influence of node x in a
                                          y                subgraph that does not
                                                           contain y
      Let the seed set S = {x,y}, then influence spread of S is
       (S )   V  y ( x)   V  x ( y)  1  0.4  1  0.2  2.6
47
Estimating Spread in SimPath (4)

                              Enumerating all simple
                                 paths is #P hard


       Thus, influence can be estimated
         by enumerating all simple paths
      On slightly starting from the seed set.
different subgraphs

               The majority of influence flows in a
 48
                     small neighborhood.
Look Ahead Optimization (1/2)
        As the seed set grows, the time spent in
         estimating spread increases.
         ◦ More paths to enumerate.

        A lot of paths are repeated though.

        The optimization avoids this repetition
         intelligently.

        A look ahead parameter ‘l’.
49
Look Ahead Optimization (2/2)
     l = 2 here              Seed Set Si after iteration i
 Let y and x be
 prospective seeds
 from CELF queue




                     y

                                                             ....
                     x
              A lot of paths are enumerated repeatedly
                  1. Compute spread achieved by S+y
                  2. Compute spread achieved by S+x


50
SimPath vs LDAG




51
Other means of speedup
        Community-based approach
              [Wang et al. KDD 2010]




        Network sparsification
                [Mathioudakis et al, KDD 2011]




        Simulated Annealing
                [Jiang et al. AAAI 2011]


52
Community-based influence
     maximization
        Use influence parameters to partition graph
         into communities
         ◦ different from pure structure-based partition

        Influence maximization within each
         community

        Dynamic programming for selecting top
         seeds

        Orthogonal with scalable influence
         computation algorithms
53                                             [Wang et al. KDD 2010]
Sparsification of influence networks
        Remove less important edges for
         influence propagation

        Use action traces to guide graph
         trimming

        Orthogonal to scalable influence
         computation algorithms


54                                [Mathioudakis et al, KDD 2011]
Simulated annealing
        Following simulated annealing
         framework
         ◦ not going through all nodes to find a next
           seed as in greedy
         ◦ find a random replacement (guided by
           some heuristics)

        Orthogonal to scalable influence
         computation algorithms

55
Key takeaways for scalable
     influence maximization
        tackle from multiple angles
         ◦ reduce #spread evaluations: lazy-forward
         ◦ scale up spread computation:
           use local influence region (LDAG, MIA, SimPath)
           use efficient graph structure (trees, DAGs)
           model specific optimizations (simple path
            enumerations)
         ◦ graph sparsifications, community partition, etc.

        upshot: can scale up to graphs with millions
         of nodes and edges, on a single machine
56

More Related Content

What's hot

Networkx & Gephi Tutorial #Pydata NYC
Networkx & Gephi Tutorial #Pydata NYCNetworkx & Gephi Tutorial #Pydata NYC
Networkx & Gephi Tutorial #Pydata NYCGilad Lotan
 
Neural Networks: Self-Organizing Maps (SOM)
Neural Networks:  Self-Organizing Maps (SOM)Neural Networks:  Self-Organizing Maps (SOM)
Neural Networks: Self-Organizing Maps (SOM)Mostafa G. M. Mostafa
 
Information Retrieval-4(inverted index_&amp;_query handling)
Information Retrieval-4(inverted index_&amp;_query handling)Information Retrieval-4(inverted index_&amp;_query handling)
Information Retrieval-4(inverted index_&amp;_query handling)Jeet Das
 
Power Laws and Rich-Get-Richer Phenomena
Power Laws and Rich-Get-Richer PhenomenaPower Laws and Rich-Get-Richer Phenomena
Power Laws and Rich-Get-Richer PhenomenaAi Sha
 
Music Recommendations at Scale with Spark
Music Recommendations at Scale with SparkMusic Recommendations at Scale with Spark
Music Recommendations at Scale with SparkChris Johnson
 
[unofficial] Pyramid Scene Parsing Network (CVPR 2017)
[unofficial] Pyramid Scene Parsing Network (CVPR 2017)[unofficial] Pyramid Scene Parsing Network (CVPR 2017)
[unofficial] Pyramid Scene Parsing Network (CVPR 2017)Shunta Saito
 
Introduction to Neural Networks
Introduction to Neural NetworksIntroduction to Neural Networks
Introduction to Neural NetworksDatabricks
 
Word2vec algorithm
Word2vec algorithmWord2vec algorithm
Word2vec algorithmAndrew Koo
 
Introduction to Graph Neural Networks: Basics and Applications - Katsuhiko Is...
Introduction to Graph Neural Networks: Basics and Applications - Katsuhiko Is...Introduction to Graph Neural Networks: Basics and Applications - Katsuhiko Is...
Introduction to Graph Neural Networks: Basics and Applications - Katsuhiko Is...Preferred Networks
 
Distributed machine learning
Distributed machine learningDistributed machine learning
Distributed machine learningStanley Wang
 
[PR12] Inception and Xception - Jaejun Yoo
[PR12] Inception and Xception - Jaejun Yoo[PR12] Inception and Xception - Jaejun Yoo
[PR12] Inception and Xception - Jaejun YooJaeJun Yoo
 
Convolutional Neural Network and Its Applications
Convolutional Neural Network and Its ApplicationsConvolutional Neural Network and Its Applications
Convolutional Neural Network and Its ApplicationsKasun Chinthaka Piyarathna
 
Simple Matrix Factorization for Recommendation in Mahout
Simple Matrix Factorization for Recommendation in MahoutSimple Matrix Factorization for Recommendation in Mahout
Simple Matrix Factorization for Recommendation in MahoutData Science London
 
Social Network Visualization 101
Social Network Visualization 101Social Network Visualization 101
Social Network Visualization 101librarianrafia
 
GraphSage vs Pinsage #InsideArangoDB
GraphSage vs Pinsage #InsideArangoDBGraphSage vs Pinsage #InsideArangoDB
GraphSage vs Pinsage #InsideArangoDBArangoDB Database
 

What's hot (20)

Networkx & Gephi Tutorial #Pydata NYC
Networkx & Gephi Tutorial #Pydata NYCNetworkx & Gephi Tutorial #Pydata NYC
Networkx & Gephi Tutorial #Pydata NYC
 
Random walk on Graphs
Random walk on GraphsRandom walk on Graphs
Random walk on Graphs
 
Neural Networks: Self-Organizing Maps (SOM)
Neural Networks:  Self-Organizing Maps (SOM)Neural Networks:  Self-Organizing Maps (SOM)
Neural Networks: Self-Organizing Maps (SOM)
 
Information Retrieval-4(inverted index_&amp;_query handling)
Information Retrieval-4(inverted index_&amp;_query handling)Information Retrieval-4(inverted index_&amp;_query handling)
Information Retrieval-4(inverted index_&amp;_query handling)
 
Power Laws and Rich-Get-Richer Phenomena
Power Laws and Rich-Get-Richer PhenomenaPower Laws and Rich-Get-Richer Phenomena
Power Laws and Rich-Get-Richer Phenomena
 
Music Recommendations at Scale with Spark
Music Recommendations at Scale with SparkMusic Recommendations at Scale with Spark
Music Recommendations at Scale with Spark
 
[unofficial] Pyramid Scene Parsing Network (CVPR 2017)
[unofficial] Pyramid Scene Parsing Network (CVPR 2017)[unofficial] Pyramid Scene Parsing Network (CVPR 2017)
[unofficial] Pyramid Scene Parsing Network (CVPR 2017)
 
Word2Vec
Word2VecWord2Vec
Word2Vec
 
Introduction to Neural Networks
Introduction to Neural NetworksIntroduction to Neural Networks
Introduction to Neural Networks
 
Word2vec algorithm
Word2vec algorithmWord2vec algorithm
Word2vec algorithm
 
Introduction to Graph Neural Networks: Basics and Applications - Katsuhiko Is...
Introduction to Graph Neural Networks: Basics and Applications - Katsuhiko Is...Introduction to Graph Neural Networks: Basics and Applications - Katsuhiko Is...
Introduction to Graph Neural Networks: Basics and Applications - Katsuhiko Is...
 
Deep Learning
Deep LearningDeep Learning
Deep Learning
 
Distributed machine learning
Distributed machine learningDistributed machine learning
Distributed machine learning
 
[PR12] Inception and Xception - Jaejun Yoo
[PR12] Inception and Xception - Jaejun Yoo[PR12] Inception and Xception - Jaejun Yoo
[PR12] Inception and Xception - Jaejun Yoo
 
Convolutional Neural Network and Its Applications
Convolutional Neural Network and Its ApplicationsConvolutional Neural Network and Its Applications
Convolutional Neural Network and Its Applications
 
Cnn
CnnCnn
Cnn
 
Simple Matrix Factorization for Recommendation in Mahout
Simple Matrix Factorization for Recommendation in MahoutSimple Matrix Factorization for Recommendation in Mahout
Simple Matrix Factorization for Recommendation in Mahout
 
Social Network Visualization 101
Social Network Visualization 101Social Network Visualization 101
Social Network Visualization 101
 
GraphSage vs Pinsage #InsideArangoDB
GraphSage vs Pinsage #InsideArangoDBGraphSage vs Pinsage #InsideArangoDB
GraphSage vs Pinsage #InsideArangoDB
 
Bayesian networks
Bayesian networksBayesian networks
Bayesian networks
 

Viewers also liked

Kdd12 tutorial-inf-part-ii
Kdd12 tutorial-inf-part-iiKdd12 tutorial-inf-part-ii
Kdd12 tutorial-inf-part-iiLaks Lakshmanan
 
Kdd12 tutorial-inf-part-i
Kdd12 tutorial-inf-part-iKdd12 tutorial-inf-part-i
Kdd12 tutorial-inf-part-iLaks Lakshmanan
 
Kdd12 tutorial-inf-part-iv
Kdd12 tutorial-inf-part-ivKdd12 tutorial-inf-part-iv
Kdd12 tutorial-inf-part-ivLaks Lakshmanan
 
Time Critical Influence Maximization
Time Critical Influence MaximizationTime Critical Influence Maximization
Time Critical Influence MaximizationWei Lu
 
Measuring academic influence: Not all citations are equal
Measuring academic influence: Not all citations are equalMeasuring academic influence: Not all citations are equal
Measuring academic influence: Not all citations are equalAndre Vellino
 
Usage-Based vs. Citation-Based Recommenders in a Digital Library
Usage-Based vs. Citation-Based Recommenders in a Digital LibraryUsage-Based vs. Citation-Based Recommenders in a Digital Library
Usage-Based vs. Citation-Based Recommenders in a Digital LibraryAndre Vellino
 
Digital Brand Monitoring and Management
Digital Brand Monitoring and ManagementDigital Brand Monitoring and Management
Digital Brand Monitoring and ManagementRob Garner
 
Pubcon Google+ Google Plus and Author Markup Panel by Rob Garner
Pubcon Google+ Google Plus and Author Markup Panel by Rob GarnerPubcon Google+ Google Plus and Author Markup Panel by Rob Garner
Pubcon Google+ Google Plus and Author Markup Panel by Rob GarnerRob Garner
 
Least Cost Influence in Multiplex Social Networks
Least Cost Influence in Multiplex Social NetworksLeast Cost Influence in Multiplex Social Networks
Least Cost Influence in Multiplex Social NetworksNatasha Mandal
 
Big-O(Q) Social Network Analytics
Big-O(Q) Social Network AnalyticsBig-O(Q) Social Network Analytics
Big-O(Q) Social Network AnalyticsLaks Lakshmanan
 
The Effects of Time on Query Flow Graph-based Models for Query Suggestion
The Effects of Time on Query Flow Graph-based Models for Query SuggestionThe Effects of Time on Query Flow Graph-based Models for Query Suggestion
The Effects of Time on Query Flow Graph-based Models for Query SuggestionCarlos Castillo (ChaTo)
 
Dr. Searcher and Mr. Browser: A unified hyperlink-click graph
Dr. Searcher and Mr. Browser: A unified hyperlink-click graphDr. Searcher and Mr. Browser: A unified hyperlink-click graph
Dr. Searcher and Mr. Browser: A unified hyperlink-click graphCarlos Castillo (ChaTo)
 
Social Media News Mining and Automatic Content Analysis of News
Social Media News Mining and Automatic Content Analysis of NewsSocial Media News Mining and Automatic Content Analysis of News
Social Media News Mining and Automatic Content Analysis of NewsCarlos Castillo (ChaTo)
 
Characterizing the Life Cycle of Online News Stories Using Social Media React...
Characterizing the Life Cycle of Online News Stories Using Social Media React...Characterizing the Life Cycle of Online News Stories Using Social Media React...
Characterizing the Life Cycle of Online News Stories Using Social Media React...Carlos Castillo (ChaTo)
 
Information Verification During Natural Disasters
Information Verification During Natural DisastersInformation Verification During Natural Disasters
Information Verification During Natural DisastersCarlos Castillo (ChaTo)
 
Social Media News Communities: Gatekeeping, Coverage, and Statement Bias
 Social Media News Communities: Gatekeeping, Coverage, and Statement Bias Social Media News Communities: Gatekeeping, Coverage, and Statement Bias
Social Media News Communities: Gatekeeping, Coverage, and Statement BiasMounia Lalmas-Roelleke
 
Keynote talk: Big Crisis Data, an Open Invitation
Keynote talk: Big Crisis Data, an Open InvitationKeynote talk: Big Crisis Data, an Open Invitation
Keynote talk: Big Crisis Data, an Open InvitationCarlos Castillo (ChaTo)
 
TweetCred: Real-Time Credibility Assessment of 
 Content on Twitter @ Socinfo...
TweetCred: Real-Time Credibility Assessment of 
 Content on Twitter @ Socinfo...TweetCred: Real-Time Credibility Assessment of 
 Content on Twitter @ Socinfo...
TweetCred: Real-Time Credibility Assessment of 
 Content on Twitter @ Socinfo...IIIT Hyderabad
 
Extracting Information Nuggets from Disaster-Related Messages in Social Media
Extracting Information Nuggets from Disaster-Related Messages in Social MediaExtracting Information Nuggets from Disaster-Related Messages in Social Media
Extracting Information Nuggets from Disaster-Related Messages in Social MediaMuhammad Imran
 
Chapter 17 Designing And Managing Integrated Marketing Communications
Chapter 17 Designing And Managing Integrated Marketing CommunicationsChapter 17 Designing And Managing Integrated Marketing Communications
Chapter 17 Designing And Managing Integrated Marketing CommunicationsDiarta
 

Viewers also liked (20)

Kdd12 tutorial-inf-part-ii
Kdd12 tutorial-inf-part-iiKdd12 tutorial-inf-part-ii
Kdd12 tutorial-inf-part-ii
 
Kdd12 tutorial-inf-part-i
Kdd12 tutorial-inf-part-iKdd12 tutorial-inf-part-i
Kdd12 tutorial-inf-part-i
 
Kdd12 tutorial-inf-part-iv
Kdd12 tutorial-inf-part-ivKdd12 tutorial-inf-part-iv
Kdd12 tutorial-inf-part-iv
 
Time Critical Influence Maximization
Time Critical Influence MaximizationTime Critical Influence Maximization
Time Critical Influence Maximization
 
Measuring academic influence: Not all citations are equal
Measuring academic influence: Not all citations are equalMeasuring academic influence: Not all citations are equal
Measuring academic influence: Not all citations are equal
 
Usage-Based vs. Citation-Based Recommenders in a Digital Library
Usage-Based vs. Citation-Based Recommenders in a Digital LibraryUsage-Based vs. Citation-Based Recommenders in a Digital Library
Usage-Based vs. Citation-Based Recommenders in a Digital Library
 
Digital Brand Monitoring and Management
Digital Brand Monitoring and ManagementDigital Brand Monitoring and Management
Digital Brand Monitoring and Management
 
Pubcon Google+ Google Plus and Author Markup Panel by Rob Garner
Pubcon Google+ Google Plus and Author Markup Panel by Rob GarnerPubcon Google+ Google Plus and Author Markup Panel by Rob Garner
Pubcon Google+ Google Plus and Author Markup Panel by Rob Garner
 
Least Cost Influence in Multiplex Social Networks
Least Cost Influence in Multiplex Social NetworksLeast Cost Influence in Multiplex Social Networks
Least Cost Influence in Multiplex Social Networks
 
Big-O(Q) Social Network Analytics
Big-O(Q) Social Network AnalyticsBig-O(Q) Social Network Analytics
Big-O(Q) Social Network Analytics
 
The Effects of Time on Query Flow Graph-based Models for Query Suggestion
The Effects of Time on Query Flow Graph-based Models for Query SuggestionThe Effects of Time on Query Flow Graph-based Models for Query Suggestion
The Effects of Time on Query Flow Graph-based Models for Query Suggestion
 
Dr. Searcher and Mr. Browser: A unified hyperlink-click graph
Dr. Searcher and Mr. Browser: A unified hyperlink-click graphDr. Searcher and Mr. Browser: A unified hyperlink-click graph
Dr. Searcher and Mr. Browser: A unified hyperlink-click graph
 
Social Media News Mining and Automatic Content Analysis of News
Social Media News Mining and Automatic Content Analysis of NewsSocial Media News Mining and Automatic Content Analysis of News
Social Media News Mining and Automatic Content Analysis of News
 
Characterizing the Life Cycle of Online News Stories Using Social Media React...
Characterizing the Life Cycle of Online News Stories Using Social Media React...Characterizing the Life Cycle of Online News Stories Using Social Media React...
Characterizing the Life Cycle of Online News Stories Using Social Media React...
 
Information Verification During Natural Disasters
Information Verification During Natural DisastersInformation Verification During Natural Disasters
Information Verification During Natural Disasters
 
Social Media News Communities: Gatekeeping, Coverage, and Statement Bias
 Social Media News Communities: Gatekeeping, Coverage, and Statement Bias Social Media News Communities: Gatekeeping, Coverage, and Statement Bias
Social Media News Communities: Gatekeeping, Coverage, and Statement Bias
 
Keynote talk: Big Crisis Data, an Open Invitation
Keynote talk: Big Crisis Data, an Open InvitationKeynote talk: Big Crisis Data, an Open Invitation
Keynote talk: Big Crisis Data, an Open Invitation
 
TweetCred: Real-Time Credibility Assessment of 
 Content on Twitter @ Socinfo...
TweetCred: Real-Time Credibility Assessment of 
 Content on Twitter @ Socinfo...TweetCred: Real-Time Credibility Assessment of 
 Content on Twitter @ Socinfo...
TweetCred: Real-Time Credibility Assessment of 
 Content on Twitter @ Socinfo...
 
Extracting Information Nuggets from Disaster-Related Messages in Social Media
Extracting Information Nuggets from Disaster-Related Messages in Social MediaExtracting Information Nuggets from Disaster-Related Messages in Social Media
Extracting Information Nuggets from Disaster-Related Messages in Social Media
 
Chapter 17 Designing And Managing Integrated Marketing Communications
Chapter 17 Designing And Managing Integrated Marketing CommunicationsChapter 17 Designing And Managing Integrated Marketing Communications
Chapter 17 Designing And Managing Integrated Marketing Communications
 

Similar to Maximizing Influence in Social Networks

Deep Neural Networks (D1L2 Insight@DCU Machine Learning Workshop 2017)
Deep Neural Networks (D1L2 Insight@DCU Machine Learning Workshop 2017)Deep Neural Networks (D1L2 Insight@DCU Machine Learning Workshop 2017)
Deep Neural Networks (D1L2 Insight@DCU Machine Learning Workshop 2017)Universitat Politècnica de Catalunya
 
Introduction to Neural Networks and Deep Learning
Introduction to Neural Networks and Deep LearningIntroduction to Neural Networks and Deep Learning
Introduction to Neural Networks and Deep LearningVahid Mirjalili
 
08 neural networks
08 neural networks08 neural networks
08 neural networksankit_ppt
 
Multilayer Perceptron Neural Network MLP
Multilayer Perceptron Neural Network MLPMultilayer Perceptron Neural Network MLP
Multilayer Perceptron Neural Network MLPAbdullah al Mamun
 
Neural network basic and introduction of Deep learning
Neural network basic and introduction of Deep learningNeural network basic and introduction of Deep learning
Neural network basic and introduction of Deep learningTapas Majumdar
 
Graph Analysis Beyond Linear Algebra
Graph Analysis Beyond Linear AlgebraGraph Analysis Beyond Linear Algebra
Graph Analysis Beyond Linear AlgebraJason Riedy
 
Auto encoders in Deep Learning
Auto encoders in Deep LearningAuto encoders in Deep Learning
Auto encoders in Deep LearningShajun Nisha
 
Deep learning study 2
Deep learning study 2Deep learning study 2
Deep learning study 2San Kim
 
Introduction to Neural Network
Introduction to Neural NetworkIntroduction to Neural Network
Introduction to Neural NetworkOmer Korech
 
Neural Networks. Overview
Neural Networks. OverviewNeural Networks. Overview
Neural Networks. OverviewOleksandr Baiev
 
Deep Learning Module 2A Training MLP.pptx
Deep Learning Module 2A Training MLP.pptxDeep Learning Module 2A Training MLP.pptx
Deep Learning Module 2A Training MLP.pptxvipul6601
 
Profit Maximization over Social Networks
Profit Maximization over Social NetworksProfit Maximization over Social Networks
Profit Maximization over Social NetworksWei Lu
 
(141205) Masters_Thesis_Defense_Sundong_Kim
(141205) Masters_Thesis_Defense_Sundong_Kim(141205) Masters_Thesis_Defense_Sundong_Kim
(141205) Masters_Thesis_Defense_Sundong_KimSundong Kim
 
[ICLR2021 (spotlight)] Benefit of deep learning with non-convex noisy gradien...
[ICLR2021 (spotlight)] Benefit of deep learning with non-convex noisy gradien...[ICLR2021 (spotlight)] Benefit of deep learning with non-convex noisy gradien...
[ICLR2021 (spotlight)] Benefit of deep learning with non-convex noisy gradien...Taiji Suzuki
 
Exploring Simple Siamese Representation Learning
Exploring Simple Siamese Representation LearningExploring Simple Siamese Representation Learning
Exploring Simple Siamese Representation LearningSungchul Kim
 
Rethinking Attention with Performers
Rethinking Attention with PerformersRethinking Attention with Performers
Rethinking Attention with PerformersJoonhyung Lee
 

Similar to Maximizing Influence in Social Networks (20)

Deep Neural Networks (D1L2 Insight@DCU Machine Learning Workshop 2017)
Deep Neural Networks (D1L2 Insight@DCU Machine Learning Workshop 2017)Deep Neural Networks (D1L2 Insight@DCU Machine Learning Workshop 2017)
Deep Neural Networks (D1L2 Insight@DCU Machine Learning Workshop 2017)
 
Introduction to Neural Networks and Deep Learning
Introduction to Neural Networks and Deep LearningIntroduction to Neural Networks and Deep Learning
Introduction to Neural Networks and Deep Learning
 
08 neural networks
08 neural networks08 neural networks
08 neural networks
 
Multilayer Perceptron Neural Network MLP
Multilayer Perceptron Neural Network MLPMultilayer Perceptron Neural Network MLP
Multilayer Perceptron Neural Network MLP
 
Neural network basic and introduction of Deep learning
Neural network basic and introduction of Deep learningNeural network basic and introduction of Deep learning
Neural network basic and introduction of Deep learning
 
Graph Analysis Beyond Linear Algebra
Graph Analysis Beyond Linear AlgebraGraph Analysis Beyond Linear Algebra
Graph Analysis Beyond Linear Algebra
 
Auto encoders in Deep Learning
Auto encoders in Deep LearningAuto encoders in Deep Learning
Auto encoders in Deep Learning
 
call for papers, research paper publishing, where to publish research paper, ...
call for papers, research paper publishing, where to publish research paper, ...call for papers, research paper publishing, where to publish research paper, ...
call for papers, research paper publishing, where to publish research paper, ...
 
Deep learning study 2
Deep learning study 2Deep learning study 2
Deep learning study 2
 
Introduction to Neural Network
Introduction to Neural NetworkIntroduction to Neural Network
Introduction to Neural Network
 
Neural Networks. Overview
Neural Networks. OverviewNeural Networks. Overview
Neural Networks. Overview
 
Deep Learning Module 2A Training MLP.pptx
Deep Learning Module 2A Training MLP.pptxDeep Learning Module 2A Training MLP.pptx
Deep Learning Module 2A Training MLP.pptx
 
UNIT 5-ANN.ppt
UNIT 5-ANN.pptUNIT 5-ANN.ppt
UNIT 5-ANN.ppt
 
Pro max icdm2012-slides
Pro max icdm2012-slidesPro max icdm2012-slides
Pro max icdm2012-slides
 
Profit Maximization over Social Networks
Profit Maximization over Social NetworksProfit Maximization over Social Networks
Profit Maximization over Social Networks
 
(141205) Masters_Thesis_Defense_Sundong_Kim
(141205) Masters_Thesis_Defense_Sundong_Kim(141205) Masters_Thesis_Defense_Sundong_Kim
(141205) Masters_Thesis_Defense_Sundong_Kim
 
[ICLR2021 (spotlight)] Benefit of deep learning with non-convex noisy gradien...
[ICLR2021 (spotlight)] Benefit of deep learning with non-convex noisy gradien...[ICLR2021 (spotlight)] Benefit of deep learning with non-convex noisy gradien...
[ICLR2021 (spotlight)] Benefit of deep learning with non-convex noisy gradien...
 
Exploring Simple Siamese Representation Learning
Exploring Simple Siamese Representation LearningExploring Simple Siamese Representation Learning
Exploring Simple Siamese Representation Learning
 
Deep learning
Deep learningDeep learning
Deep learning
 
Rethinking Attention with Performers
Rethinking Attention with PerformersRethinking Attention with Performers
Rethinking Attention with Performers
 

Recently uploaded

Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebUiPathCommunity
 
Artificial intelligence in the post-deep learning era
Artificial intelligence in the post-deep learning eraArtificial intelligence in the post-deep learning era
Artificial intelligence in the post-deep learning eraDeakin University
 
New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024
New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024
New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024BookNet Canada
 
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...shyamraj55
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupFlorian Wilhelm
 
CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):comworks
 
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks..."LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...Fwdays
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024Lorenzo Miniero
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Mattias Andersson
 
APIForce Zurich 5 April Automation LPDG
APIForce Zurich 5 April  Automation LPDGAPIForce Zurich 5 April  Automation LPDG
APIForce Zurich 5 April Automation LPDGMarianaLemus7
 
Pigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping ElbowsPigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping ElbowsPigging Solutions
 
Vertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsVertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsMiki Katsuragi
 
My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024The Digital Insurer
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubKalema Edgar
 
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 3652toLead Limited
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr BaganFwdays
 
costume and set research powerpoint presentation
costume and set research powerpoint presentationcostume and set research powerpoint presentation
costume and set research powerpoint presentationphoebematthew05
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsMark Billinghurst
 
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Scott Keck-Warren
 

Recently uploaded (20)

Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio Web
 
Artificial intelligence in the post-deep learning era
Artificial intelligence in the post-deep learning eraArtificial intelligence in the post-deep learning era
Artificial intelligence in the post-deep learning era
 
New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024
New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024
New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024
 
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project Setup
 
CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):
 
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks..."LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?
 
APIForce Zurich 5 April Automation LPDG
APIForce Zurich 5 April  Automation LPDGAPIForce Zurich 5 April  Automation LPDG
APIForce Zurich 5 April Automation LPDG
 
Pigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping ElbowsPigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping Elbows
 
Vertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsVertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering Tips
 
My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding Club
 
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan
 
costume and set research powerpoint presentation
costume and set research powerpoint presentationcostume and set research powerpoint presentation
costume and set research powerpoint presentation
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR Systems
 
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024
 
Vulnerability_Management_GRC_by Sohang Sengupta.pptx
Vulnerability_Management_GRC_by Sohang Sengupta.pptxVulnerability_Management_GRC_by Sohang Sengupta.pptx
Vulnerability_Management_GRC_by Sohang Sengupta.pptx
 

Maximizing Influence in Social Networks

  • 1. Part I → Part II → Part III → Part IV → Part V Influence Maximization 1
  • 2. Word-of-mouth (WoM) effect in social networks xphone is good xphone is good xphone is good xphone is good xphone is good xphone is good xphone is good  Word-of-mouth (viral) marketing is believed to be a promising marketing strategy.  Increasing popularity of online social networks may enable large scale viral marketing 2
  • 3. Outline  Diffusion/propagation models and the Influence Maximization (IM) problem  The theory: greedy methods to optimize submodular functions  Scalable influence maximization 3
  • 4. Diffusion/Propagation Models and the Influence Maximization (IM) Problem 4
  • 5. The first definition of IM problem: A Markov random fields formulation  Each node 𝑖 has random variable 𝑋 𝑖 , indicating bought the product or not, 𝑿 = 𝑋1 , … , 𝑋 𝑛  Markov random field formation: 𝑋 𝑖 depends on its neighbors’ actions 𝑵 𝑖 ⊆ 𝑿  Marketing action 𝑴 = {𝑀1 , … , 𝑀 𝑛 }  Problem: find a choice of 𝑴 that maximizes the revenue obtained from the result of 𝑿 5 [Domingos & Richardson KDD 2001, 2002]
  • 6. Discrete diffusion models  Static social network: 𝐺 = (𝑉, 𝐸) ◦ 𝑉: set of nodes, representing individuals ◦ 𝐸: set of edges, representing directed influence relationships  Influence diffusion process ◦ Seed set 𝑺: initial set of nodes selected to start the diffusion ◦ Node activations: Nodes are activated starting from the seed nodes, in discrete steps and following certain stochastic models ◦ Influence spread 𝝈(𝑺): expected number of activated nodes when the diffusion process starting from the seed set 𝑆 ends 6
  • 7. Major stochastic diffusion models  Independent cascade (IC) model  Linear threshold (LT) model  General threshold model  Others ◦ Voter model ◦ Heat diffusion model 7
  • 8. Independent cascade model • Each edge (𝑢, 𝑣) has a propagation probability 𝑝(𝑢, 𝑣) • Initially some seed nodes are activated • At each step 𝑡, each 0.3 node 𝑢 activated at 0.1 step 𝑡 − 1 activates its neighbor 𝑣 with probability 𝑝(𝑢, 𝑣) • once activated, stay activated 8 [Kempe, Kleinberg and Tardos, KDD 2003]
  • 9. Linear threshold model  Each edge (𝑢, 𝑣) has weight 0.6 0.5 𝑤 𝑢, 𝑣 : ◦ when 𝑢, 𝑣 ∉ 𝐸, 𝑤 𝑢, 𝑣 = 0 ◦ 𝑢 𝑤 𝑢, 𝑣 ≤ 1  Each node 𝑣 selects a threshold 𝜃 𝑣 ∈ [0,1] uniformly 0.3 0.3 at random  Initially some seed nodes are activated 0.4 0.3  At each step, node 0.1 0.7 𝑣 checks if the weighted sum of its active neighbors is greater than its threshold 𝜃 𝑣 , if so 𝑣 is activated 0.2 0.8 0.3  once activated, stay activated 9 [Kempe, Kleinberg and Tardos, KDD 2003]
  • 10. Influence maximization  Given a social network, a diffusion model with given parameters, and a number 𝑘, find a seed set 𝑆 of at most 𝑘 nodes such that the influence spread of 𝑆 is maximized.  May be further generalized: ◦ Instead of k, given a budget constraint and each node has a cost of being selected as a seed ◦ Instead of maximizing influence spread, maximizing a (submodular) function of the set of activated nodes 10
  • 11. Key takeaways for diffusion models and IM definition  stochastic diffusion models ◦ IC model reflects simple contagion ◦ LT model reflects complex contagion (activation needs social affirmation from multiple sources [Centola and Macy, AJS 2007])  maximization objective focuses on expected influence spread ◦ others to be considered later 11
  • 12. Time for Theory: Optimizing Submodular Functions Credit: Rico3244 @ DeviantArt 12
  • 13. Hardness of influence maximization  Influence maximization under both IC and LT models are NP hard ◦ IC model: reduced from k-max cover problem ◦ LT model: reduced from vertex cover problem  Need approximation algorithms 13
  • 14. Optimizing submodular functions  Sumodularity of set functions 𝑓: 2V → 𝑅 ◦ for all 𝑆 ⊆ 𝑇 ⊆ 𝑉, all 𝑣 ∈ 𝑉 ∖ 𝑇, 𝑓(𝑆) 𝑓 𝑆∪ 𝑣 − 𝑓 𝑆 ≥ 𝑓 𝑇∪ 𝑣 − 𝑓(𝑇) ◦ diminishing marginal return ◦ an equivalent form: for all 𝑆, 𝑇 ⊆ 𝑉 |𝑆| 𝑓 𝑆∪ 𝑇 + 𝑓 𝑆∩ 𝑇 ≤ 𝑓 𝑆 + 𝑓 𝑇  Monotonicity of set functions 𝑓: for all 𝑆 ⊆ 𝑇 ⊆ 𝑉, 𝑓 𝑆 ≤ 𝑓(𝑇) 14
  • 15. Example of a submodular function and its maximization problem  set coverage ◦ each entry 𝑢 is a subset of elements some base elements sets ◦ coverage 𝑓 𝑆 = | 𝑢∈𝑆 𝑢 | ◦ 𝑓 𝑆 ∪ 𝑣 − 𝑓 𝑆 : additional 𝑆 coverage of v on top of 𝑆 𝑇  𝑘-max cover problem ◦ find 𝑘 subsets that maximizes 𝑣 their total coverage ◦ NP-hard ◦ special case of IM problem in IC model 15
  • 16. Greedy algorithm for submodular function maximization 1: initialize 𝑆 = ∅ ; 2: for 𝑖 = 1 to 𝑘 do 3: select 𝑢 = argmax 𝑤∈𝑉∖𝑆 [𝑓 𝑆 ∪ 𝑤 − 𝑓(𝑆))] 4: 𝑆 = 𝑆 ∪ {𝑢} 5: end for 6: output 𝑆 16
  • 17. Property of the greedy algorithm  Theorem: If the set function 𝑓 is monotone and submodular with 𝑓 ∅ = 0, then the greedy algorithm achieves (1 − 1/𝑒) approximation ratio, that is, the solution 𝑆 found by the greedy algorithm satisfies: 1 ◦ 𝑓 𝑆 ≥ 1− max 𝑆 ′⊆𝑉, 𝑆′ =𝑘 𝑓(𝑆 ′ ) 𝑒 17 [Nemhauser, Wolsey and Fisher, Mathematical Programming, 1978]
  • 18. Submodularity of influence diffusion models  Based on equivalent live-edge graphs 0.3 0.1 diffusion dynamic random live-edge graph: edges are randomly removed Pr(set A is activated given Pr(set A is reachable from S in seed set S) random live-ledge graph) 18
  • 19. (Recall) active node set via IC diffusion process • yellow node set is the active node set after the diffusion process in the independent cascade model 0.3 0.1 19
  • 20. Random live-edge graph for the IC model and its reachable node set  random live-edge graph in the IC model ◦ each edge is independently selected as live with its propagation probability  yellow node set is the 0.3 active node set reachable from the 0.1 seed set in a random live-edge graph  Equivalence is straightforward 20
  • 21. (Recall) active node set via LT diffusion process 0.5  yellow node set 0.6 is the active node set after 0.3 0.3 the diffusion process in the 0.4 0.3 linear threshold 0.1 0.7 model 0.2 0.8 0.3 21
  • 22. Random live-edge graph for the LT model and its reachable node set  random live-edge graph in the LT model ◦ each node select at most one incoming edge, with probability proportional to its weight  yellow node set is the active node set reachable from the 0.3 seed set in a random live-edge graph 0.1  equivalence is based on uniform threshold selection from [0,1], and linear weight addition 22
  • 23. Submodularity of influence diffusion models (cont’d) • Influence spread of seed set 𝑆, 𝜎(𝑆): 𝜎 𝑆 = 𝐺 𝐿 Pr 𝐺 𝐿 |𝑅 𝑆, 𝐺 𝐿 |, • 𝐺 𝐿 : a random live-edge graph • Pr 𝐺 𝐿 : probability of 𝐺 𝐿 being generated • 𝑅(𝑆, 𝐺 𝐿 ): set of nodes reachable from 𝑆 in 𝐺 𝐿 • To prove that 𝜎 𝑆 is submodular, only need to show that 𝑅 ⋅, 𝐺 𝐿 is submodular for any 𝐺 𝐿 • sumodularity is maintained through linear combinations with non-negative coefficients 23
  • 24. Submodularity of influence diffusion models (cont’d) • Submodularity of 𝑅 ⋅, 𝐺 𝐿 • for any 𝑆 ⊆ 𝑇 ⊆ 𝑉, 𝑣 ∈ 𝑉 ∖ 𝑇, • if 𝑢 is reachable from 𝑣 𝑆 𝑇 but not from 𝑇, then • 𝑢 is reachable from 𝑣 but 𝑣 not from 𝑆 • Hence, 𝑅 ⋅, 𝐺 𝐿 is 𝑢 submodular • Therefore, influence marginal contribution spread 𝜎 𝑆 is of 𝑣 w.r.t. 𝑇 submodular in both IC and LT models 24
  • 25. General threshold model • Each node 𝑣 has a threshold function 𝑓𝑣 : 2 𝑉 → [0,1] • Each node 𝑣 selects a threshold 𝜃 𝑣 ∈ [0,1] uniformly at random • If the set of active nodes at the end of step 𝑡 − 1 is 𝑆, and 𝑓𝑣 𝑆 ≥ 𝜃 𝑣 , 𝑣 is activated at step 𝑡 • reward function 𝑟(𝐴(𝑆)): if 𝐴(𝑆) is the final set of active nodes given seed set 𝑆, 𝑟(𝐴(𝑆)) is the reward from this set • generalized influence spread: 𝜎 𝑆 = 𝐸[𝑟 𝐴 𝑆 ] [Kempe, Kleinberg and Tardos, KDD 2003] 25
  • 26. IC and LT as special cases of general threshold model • LT model • 𝑓𝑣 𝑆 = 𝑢∈𝑆 𝑤(𝑢, 𝑣) • 𝑟(𝑆) = |𝑆| • IC model • 𝑓𝑣 𝑆 = 1 − 𝑢∈𝑆(1 − 𝑝 𝑢, 𝑣 ) • 𝑟(𝑆) = |𝑆| 26
  • 27. Submodularity in the general threshold model  Theorem [Mossel & Roch STOC 2007]: ◦ In the general threshold model, ◦ if for every 𝑣 ∈ 𝑉, 𝑓𝑣 (⋅) is monotone and submodular with 𝑓𝑣 ∅ = 0, ◦ and the reward function 𝑟(⋅) is monotone and submodular, ◦ then the general influence spread function 𝜎 ⋅ is monotone and submodular.  Local submodularity implies global submodularity 27
  • 28. The greedy approximation algorithm for the IM problem 1 • 1− − 𝜀 -approximation greedy algorithm 𝑒 • Use general greedy algorithm framework • However, need evaluation of influence spread 𝜎 𝑆 , which is shown to be #P-hard • Use Monte Carlo simulations to estimate 𝜎 𝑆 to arbitrary accuracy with high probability 1 • For any 𝜀 > 0, there exists 𝛾 > 0, s.t. 1 − − 𝜀 - 𝑒 approximation can be achieved with 1 − 𝛾 approximate values of 𝜎 𝑆 28
  • 29. Performance of greedy algorithm weighted cascade model linear threshold model • coauthorship graph from arxiv.org, high energy physics section, n=10748, m53000 • allow parallel edges, 𝑐 𝑢,𝑣 = number of edges between u and v 1 𝑐 𝑢,𝑣 • weighted cascade: 𝑝 𝑢, 𝑣 = 1 − 1 − 𝑑𝑣 • linear threshold: 𝑤 𝑢, 𝑣 = 𝑐 𝑢,𝑣 /𝑑 𝑣 29 [Kempe, Kleinberg and Tardos, KDD 2013]
  • 30. Key takeaways for theory support  submodularity of diffusion models ◦ diminishing return ◦ shared by many models, but some model extensions may not be submodular (see Part IV)  submodular function maximization ◦ greedy hill-climbing algorithm ◦ approximation ratio (1 − 1/𝑒) 30
  • 32. Theory versus Practice ...the trouble about arguments is, they ain't nothing but theories, after all, and theories don't prove nothing, they only give you a place to rest on, a spell, when you are tuckered out butting around and around trying to find out something there ain't no way to find out... There's another trouble about theories: there's always a hole in them somewheres, sure, if you look close enough. - “Tom Sawyer Abroad”, Mark Twain 32
  • 33. Inefficiency of greedy algorithm  It take days to find 50 seeds for a 30K node graph  Too many evaluations of influence spread ◦ Given a graph of 𝑛 nodes, each round needs evaluation of 𝑛 influence spreads, totally 𝑂(𝑛𝑘) evaluations  Each evaluation of influence spread is hard ◦ Exact computation is #P-hard, for both IC and LT models [Chen et al. KDD/ICDM 2010] ◦ Monte-Carlo simulation is very slow 33
  • 34. Scalable Influence Maximization  Reduction on the number of influence spread evaluations.  Batch computation of influence spread  Scalable heuristics for influence spread computation 34
  • 35. Lazy forward optimization  Exploiting submodularity, significantly reduce # influence spread evaluations ◦ 𝑆 𝑡−1 − seed set selected after round 𝑡 − 1 ◦ 𝑣 𝑡 − selected as seed in round 𝑡: 𝑆 𝑡 = 𝑆 𝑡−1 ∪ {𝑣 𝑡 } ◦ 𝑢 is not a seed yet, 𝑢’s marginal gain 𝑀𝐺 𝑢 𝑆 𝑡 = 𝜎(𝑆 𝑡 ∪ {𝑢}) − 𝜎(𝑆 𝑡 ) ◦ by submodularlity, 𝑀𝐺 𝑢 𝑆 𝑡 ≤ 𝑀𝐺(𝑢|𝑆 𝑡−1 ) ◦ This implies, if 𝑀𝐺 𝑢 𝑆 𝑡−1 ≤ 𝑀𝐺(𝑣|𝑆 𝑡 ) for some node 𝑣, then no need to evaluate 𝑀𝐺 𝑢 𝑆 𝑡 in round 𝑡 + 1. ◦ Can be implemented efficiently using max-heap  take the top of the heap, if it has 𝑀𝐺 for the current round, then it is the new seed;  else compute its 𝑀𝐺, and re-heapify  Often, top element after round 𝑘 − 1 remains top in round 𝑘.  Up to 700 X speedup 35 [Leskovec et al., KDD 2007]
  • 36. CELF++  With each heap node 𝑢, further maintain ◦ 𝑢. 𝑚𝑔1 = 𝑀𝐺(𝑢|𝑆), 𝑆 = current seed set. ◦ 𝑢. 𝑝𝑟𝑒𝑣_𝑏𝑒𝑠𝑡 = node with max. MG in the current iteration, among nodes seen before 𝑢; ◦ 𝑢. 𝑚𝑔2 = 𝑀𝐺(𝑢|𝑆 ∪ {𝑢. 𝑝𝑟𝑒𝑣_𝑏𝑒𝑠𝑡}); ◦ 𝑢. 𝑓𝑙𝑎𝑔 = iteration where 𝑢. 𝑚𝑔1 was last updated;  𝑢. 𝑝𝑟𝑒𝑣_𝑏𝑒𝑠𝑡 chosen in current iteration  no need to compute 𝑀𝐺(𝑢|𝑆 ∪ {𝑢. 𝑝𝑟𝑒𝑣_𝑏𝑒𝑠𝑡}) in next iteration.  𝑀𝐺(𝑢|𝑆 ∪ {𝑢. 𝑝𝑟𝑒𝑣_𝑏𝑒𝑠𝑡}) can be computed efficiently along with 𝑀𝐺(𝑢|𝑆) in one run of MC. 36 [Goyal, Lu and L., WWW 2011]
  • 37. CELF++ (contd.) • Pick the top node 𝑢 in the heap • 𝑢. 𝑓𝑙𝑎𝑔 = |𝑆|  𝑢. mg1 is correct 𝑀𝐺, thus 𝑢 is the best pick of current itn; add 𝑢 to 𝑆; remove from 𝑄; • else, if 𝑢. 𝑝𝑟𝑒𝑣_𝑏𝑒𝑠𝑡 = 𝑙𝑎𝑠𝑡_𝑠𝑒𝑒𝑑 & 𝑢. 𝑓𝑙𝑎𝑔 = |𝑆| − 1 𝑢. 𝑚𝑔1  𝑢. 𝑚𝑔2 (no need to compute 𝑀𝐺(𝑢|𝑆 + 𝑙𝑎𝑠𝑡_𝑠𝑒𝑒𝑑)) 37
  • 38. Batch computation of influence spread  Based on equivalence to reachable set in live-edge graphs ◦ Generate a random live-edge graph ◦ batch computation of size of reachable set of all nodes [Cohen, JCSS 1997] ◦ repeat above to increase accuracy  In conflict with lazy-forward optimization ◦ MixedGreedy: In the first round uses reachability- based batch computation; in remaining rounds, use lazy forward optimization 38 [Chen, Wang and Yang, KDD 2009]
  • 39. Scalable heuristics for influence spread computation  MIA: for IC model [Chen, Wang and Wang, KDD 2010]  LDAG: for LT model [Chen, Yuan and Zhang, ICDM 2010]  Features: ◦ Focus on a local region ◦ Find a graph spanner allow efficient computation ◦ Batch update of marginal influence spread 39
  • 40. Maximum Influence Arborescence (MIA) Heuristic  Local influence regions of node 𝑣 ◦ For every nodes 𝑢, find the maximum influence 𝑢 path (MIP) from 𝑢 to 𝑣, ignore it if Pr(𝑝𝑎𝑡ℎ) < 𝜆 (𝜆 is a small threshold value) ◦ all MIPs to 𝑣 form its 0.3 maximum influence in- 0.1 𝑣 arborescence (MIIA) ◦ MIIA can be efficiently computed ◦ influence to 𝑣 is computed over MIIA 40
  • 41. MIA Heuristic III: Computing Influence through the MIA structure  Recursive computation of activation probability 𝑎𝑝(𝑢) of a node 𝑢 in its in-arborescence, given a seed set 𝑆 (𝑁 𝑖𝑛 (𝑢) is the in-neighbors of 𝑢 in its in-arborescence) 41
  • 42. MIA Heuristic IV: Efficient updates on incremental activation probabilities 𝑀𝐼𝐼𝐴(𝑣)  𝑢 is the new seed in 𝑀𝐼𝐼𝐴(𝑣)  Naive update: for each candidate 𝑢 𝑤, redo the computation in the previous page to compute 𝑤’s marginal influence to 𝑣 𝑤 2 ◦ 𝑂( 𝑀𝐼𝐼𝐴 𝑣 )  Fast update: based on linear relationship of activation probabilities between any node 𝑤 and root 𝑣, update marginal influence of all 𝑤’s to 𝑣 in two 𝑣 passes ◦ 𝑂(|𝑀𝐼𝐼𝐴(𝑣)|) 42
  • 43. Experiment Results on MIA heuristic Influence spread vs. seed set running time size 103 times speed up close to Greedy, 49% better than Degree, 15% better than DegreeDiscount Experiment setup: • 35k nodes from coauthorship graph in physics archive • influence probability to a node 𝑣 = 1 / (# of neighbors of 𝑣) • running time is for selecting 50 seeds 43
  • 44. Time for some simplicity 44
  • 45. The SimPath Algorithm Vertex Cover Look ahead Optimization optimization Improves the efficiency Improves the efficiency in the first iteration in the subsequent iterations In lazy forward manner, in each iteration, add to the seed set, the node providing the maximum marginal gain in spread. [Goyal, Lu, & L. ICDM 2011] Simpath-Spread 45 Compute marginal gain by enumerating simple paths
  • 46. Estimating Spread in SimPath (2)  Thus, the spread of a node can be computed by enumerating simple paths starting from the node. 0.4 Influence of x on z x z Influence of x on y 0.3 0.2 Influence of x on x itself 0.1 0.5 y Total influence of node x is 1.96 Influence Spread of node x is = 1 + (0.3 + 0.4 * 0.5) + (0.4 + 0.3 * 0.2) = 1.96 46
  • 47. Estimating Spread in SimPath (3) Total influence of the seed set {x, y} is 2.6 Influence of node y in a 0.4 subgraph that does not x z 0.2 contain x 0.3 0.1 0.5 Influence of node x in a y subgraph that does not contain y Let the seed set S = {x,y}, then influence spread of S is  (S )   V  y ( x)   V  x ( y)  1  0.4  1  0.2  2.6 47
  • 48. Estimating Spread in SimPath (4) Enumerating all simple paths is #P hard Thus, influence can be estimated by enumerating all simple paths On slightly starting from the seed set. different subgraphs The majority of influence flows in a 48 small neighborhood.
  • 49. Look Ahead Optimization (1/2)  As the seed set grows, the time spent in estimating spread increases. ◦ More paths to enumerate.  A lot of paths are repeated though.  The optimization avoids this repetition intelligently.  A look ahead parameter ‘l’. 49
  • 50. Look Ahead Optimization (2/2) l = 2 here Seed Set Si after iteration i Let y and x be prospective seeds from CELF queue y .... x A lot of paths are enumerated repeatedly 1. Compute spread achieved by S+y 2. Compute spread achieved by S+x 50
  • 52. Other means of speedup  Community-based approach [Wang et al. KDD 2010]  Network sparsification [Mathioudakis et al, KDD 2011]  Simulated Annealing [Jiang et al. AAAI 2011] 52
  • 53. Community-based influence maximization  Use influence parameters to partition graph into communities ◦ different from pure structure-based partition  Influence maximization within each community  Dynamic programming for selecting top seeds  Orthogonal with scalable influence computation algorithms 53 [Wang et al. KDD 2010]
  • 54. Sparsification of influence networks  Remove less important edges for influence propagation  Use action traces to guide graph trimming  Orthogonal to scalable influence computation algorithms 54 [Mathioudakis et al, KDD 2011]
  • 55. Simulated annealing  Following simulated annealing framework ◦ not going through all nodes to find a next seed as in greedy ◦ find a random replacement (guided by some heuristics)  Orthogonal to scalable influence computation algorithms 55
  • 56. Key takeaways for scalable influence maximization  tackle from multiple angles ◦ reduce #spread evaluations: lazy-forward ◦ scale up spread computation:  use local influence region (LDAG, MIA, SimPath)  use efficient graph structure (trees, DAGs)  model specific optimizations (simple path enumerations) ◦ graph sparsifications, community partition, etc.  upshot: can scale up to graphs with millions of nodes and edges, on a single machine 56