Digital Enterprise Research Institute                                                               www.deri.ie




                            Towards Cross-Community
                              Information Diffusion
                                  Maximisation
                  Václav Belák, Samantha Lam, Conor Hayes



© Copyright 2011 Digital Enterprise Research Institute. All rights reserved.




                                                                               Enabling Networked Knowledge
Motivation
Digital Enterprise Research Institute                                        www.deri.ie


   •  Information cascades of high interest in marketing, CRM, etc.
   •  A common approach is to maximise information diffusion by
      targeting influential actors
   •  In the context of many online communities (e.g. discussion
      fora) the information is shared to the community as a whole
      and not to individual actors




  common case – targeting individuals    cross-community case – targeting communities

                                                     Enabling Networked Knowledge
Objectives
Digital Enterprise Research Institute                             www.deri.ie




   •  Our main hypothesis is that it is possible to efficiently
      spread a message over the information flow network by
      targeting highly influential communities


   •  The main problem is then formulated as a prediction of
      the set of communities to target such that the message is
      spread over the network as much as possible
       •  Spread over the actors, i.e. user activation fraction
       •  Spread over the communities, i.e. community
          activation fraction


                                             Enabling Networked Knowledge
Methods: Definition of Impact
Digital Enterprise Research Institute                                   www.deri.ie



  •  We propose (Belák et al., ‘12) to take two factors into account:
      1.  degree of community membership of the users
      2.  centrality of the users within each community




  •  Impact of community A on community B defined as an average centrality of
     actors from A within B, weighted by their membership in A

                                                   Enabling Networked Knowledge
Methods: Targeting
                                Communities
Digital Enterprise Research Institute                                              www.deri.ie

   •  Level of dispersion (heterogeneity) of total impact of community i can be
      measured as an entropy of an i-th row/column of the impact matrix

   •  We propose to target communities by means of the product of the total
      impact of community i and its entropy: impact focus (IF)

   •  We simulated the diffusion by extending Independent Cascade (ICM) and
      Linear Threshold (LTM) Models (Kempe et al., ‘03)
        1.  Take q target communities and sample s users from each of them
        2.  Run the original models from the union of sampled users
   •  Information diffusion network derived from the reply-to network:
                                             replies to
                                        i       rji       j


                                            information
                                        i                 j
                                              flow wij

                                                              Enabling Networked Knowledge
Evaluation Strategy
Digital Enterprise Research Institute                                       www.deri.ie


         •  IF compared with random targeting (R), and group in-degree (GI)
            (Everett & Borgatti, ’99)

         •  The main aim was to investigate robustness of our framework with
            respect to:
              •  Character of the system
              •  Diffusion models
              •  User and Community Activation Fractions

         •  Procedural outline
             1.  Target q communities using one of the heuristics evaluated on
                 the data from time-slice t
             2.  Run the diffusion model on the network from time-slice t+1
             3.  Compute an average user and community spreads over all
                 pairs (t, t+1)


                                                    Enabling Networked Knowledge
Evaluation Data-Sets
Digital Enterprise Research Institute                                              www.deri.ie



  •  51 weeks of data of the largest Irish
     discussion board system
  •  Segmented using 1 week sliding window
      •  1 week window represents approx. 84% of
         cross-fora posting activity
  •  540 communities, 5.3k users/snapshot (avg)



                            •  5 years of data from the technical support fora of SAP
                            •  Used only for the diffusion experiments
                            •  Segmented using 2 months sliding window
                                •  2 months represent approx. 50% of cross-fora posting
                                   activity
                            •  33 communities, 2k users/snapshot (avg)

                                                            Enabling Networked Knowledge
User Act. Fraction
Digital Enterprise Research Institute                                                                                                                                  www.deri.ie



                                                                             One targeted community
                                                     q=1, Boards−LTM                                                                  q=1, SAP−LTM
                                           0.8




                                                                                                                           0.30
                                           0.7




                                                                                                                           0.25
                                           0.6
       mean user activation fraction (u)




                                                                                       mean user activation fraction (u)

                                                                                                                           0.20
                                           0.5




                                                                                                                           0.15
                                           0.4




                                                                                                                           0.10
                                           0.3




                                                                                                                           0.05
                                           0.2




                                                                                  IF                                                                              IF
                                                                                  GI                                                                              GI
                                                                                                                           0.00
                                           0.1




                                                                                  R                                                                               R


                                                 5          10              15    20                                              5          10              15   20

                                                     user sample size (s)                                                             user sample size (s)




                                                                                                                                  Enabling Networked Knowledge
Community Act. Fr.
Digital Enterprise Research Institute                                                                                                                                              www.deri.ie



                                                                                   One targeted community
                                                            q=1, Boards−LTM                                                                       q=1, SAP−LTM




                                                                                                                                       0.5
                                                  0.8
                                                  0.7




                                                                                                                                       0.4
         mean community activation fraction (c)




                                                                                              mean community activation fraction (c)
                                                  0.6




                                                                                                                                       0.3
                                                  0.5
                                                  0.4




                                                                                                                                       0.2
                                                  0.3




                                                                                                                                       0.1
                                                  0.2




                                                                                         IF                                                                                   IF
                                                                                         GI                                                                                   GI
                                                  0.1




                                                                                                                                       0.0

                                                                                         R                                                                                    R


                                                        5          10              15    20                                                   5          10              15   20

                                                            user sample size (s)                                                                  user sample size (s)




                                                                                                                                             Enabling Networked Knowledge
Community Act. Fr.
Digital Enterprise Research Institute                                                                                                                                              www.deri.ie



                                                                             Five targeted communities
                                                             q=5, Boards−LTM                                                                      q=5, SAP−LTM




                                                                                                                                       0.5
                                                   0.8
                                                   0.7




                                                                                                                                       0.4
          mean community activation fraction (c)




                                                                                              mean community activation fraction (c)
                                                   0.6




                                                                                                                                       0.3
                                                   0.5
                                                   0.4




                                                                                                                                       0.2
                                                   0.3




                                                                                                                                       0.1
                                                   0.2




                                                                                         IF                                                                                   IF
                                                                                         GI                                                                                   GI
                                                   0.1




                                                                                                                                       0.0

                                                                                         R                                                                                    R


                                                         5          10              15   20                                                   5          10              15   20

                                                             user sample size (s)                                                                 user sample size (s)




                                                                                                                                             Enabling Networked Knowledge
Results Highlights
Digital Enterprise Research Institute                                     www.deri.ie


       •  Diffusion process became saturated at approximately 80% of users
          or communities in Boards, and 30% in SAP
           •  More efficient to target few communities

       •  Impact Focus outperformed the other two strategies with respect to
          both user and community activation fractions, namely for small
          number of targeted communities (i.e. [1, 2]) and
          seed users (i.e. [1, 20])
           •  Diminishing returns

       •  For high number of targeted communities and seed users, random
          strategy outperformed the other two with respect to community
          activation fractions in SAP data-set
            •  SAP network fragmented into many small components, which
               made it hard to reach peripheral communities


                                                   Enabling Networked Knowledge
Conclusion
Digital Enterprise Research Institute                               www.deri.ie



       •  The evaluation demonstrated that the framework
           •  is able to identify highly influential communities
           •  can predict which communities to target s.t. the
              message spreads efficiently over both individual users
              and communities

       •  We aim to extend it with content analysis
           •  E.g. What are the most influential communities with
              respect to a particular topic?

       •  We will also investigate empirically-observed topic
          cascades and modify our models accordingly if needed


                                             Enabling Networked Knowledge
Questions?
Digital Enterprise Research Institute                                       www.deri.ie




      References

      •  Belák V., Lam S., Hayes C. Cross-Community Influence in Discussion
         Fora. ICWSM. AAAI, 2012.
      •  M. Everett and S. Borgatti. The centrality of groups and classes. J. of
         Mathematical Sociology, 23(3):181–201, 1999.
      •  D. Kempe, J. Kleinberg, and É. Tardos. Maximizing the spread of
         influence through a social network. SIGKDD. ACM, 2003.

                                                     Enabling Networked Knowledge

Towards Maximising Cross-Community Information Diffusion

  • 1.
    Digital Enterprise ResearchInstitute www.deri.ie Towards Cross-Community Information Diffusion Maximisation Václav Belák, Samantha Lam, Conor Hayes © Copyright 2011 Digital Enterprise Research Institute. All rights reserved. Enabling Networked Knowledge
  • 2.
    Motivation Digital Enterprise ResearchInstitute www.deri.ie •  Information cascades of high interest in marketing, CRM, etc. •  A common approach is to maximise information diffusion by targeting influential actors •  In the context of many online communities (e.g. discussion fora) the information is shared to the community as a whole and not to individual actors common case – targeting individuals cross-community case – targeting communities Enabling Networked Knowledge
  • 3.
    Objectives Digital Enterprise ResearchInstitute www.deri.ie •  Our main hypothesis is that it is possible to efficiently spread a message over the information flow network by targeting highly influential communities •  The main problem is then formulated as a prediction of the set of communities to target such that the message is spread over the network as much as possible •  Spread over the actors, i.e. user activation fraction •  Spread over the communities, i.e. community activation fraction Enabling Networked Knowledge
  • 4.
    Methods: Definition ofImpact Digital Enterprise Research Institute www.deri.ie •  We propose (Belák et al., ‘12) to take two factors into account: 1.  degree of community membership of the users 2.  centrality of the users within each community •  Impact of community A on community B defined as an average centrality of actors from A within B, weighted by their membership in A Enabling Networked Knowledge
  • 5.
    Methods: Targeting Communities Digital Enterprise Research Institute www.deri.ie •  Level of dispersion (heterogeneity) of total impact of community i can be measured as an entropy of an i-th row/column of the impact matrix •  We propose to target communities by means of the product of the total impact of community i and its entropy: impact focus (IF) •  We simulated the diffusion by extending Independent Cascade (ICM) and Linear Threshold (LTM) Models (Kempe et al., ‘03) 1.  Take q target communities and sample s users from each of them 2.  Run the original models from the union of sampled users •  Information diffusion network derived from the reply-to network: replies to i rji j information i j flow wij Enabling Networked Knowledge
  • 6.
    Evaluation Strategy Digital EnterpriseResearch Institute www.deri.ie •  IF compared with random targeting (R), and group in-degree (GI) (Everett & Borgatti, ’99) •  The main aim was to investigate robustness of our framework with respect to: •  Character of the system •  Diffusion models •  User and Community Activation Fractions •  Procedural outline 1.  Target q communities using one of the heuristics evaluated on the data from time-slice t 2.  Run the diffusion model on the network from time-slice t+1 3.  Compute an average user and community spreads over all pairs (t, t+1) Enabling Networked Knowledge
  • 7.
    Evaluation Data-Sets Digital EnterpriseResearch Institute www.deri.ie •  51 weeks of data of the largest Irish discussion board system •  Segmented using 1 week sliding window •  1 week window represents approx. 84% of cross-fora posting activity •  540 communities, 5.3k users/snapshot (avg) •  5 years of data from the technical support fora of SAP •  Used only for the diffusion experiments •  Segmented using 2 months sliding window •  2 months represent approx. 50% of cross-fora posting activity •  33 communities, 2k users/snapshot (avg) Enabling Networked Knowledge
  • 8.
    User Act. Fraction DigitalEnterprise Research Institute www.deri.ie One targeted community q=1, Boards−LTM q=1, SAP−LTM 0.8 0.30 0.7 0.25 0.6 mean user activation fraction (u) mean user activation fraction (u) 0.20 0.5 0.15 0.4 0.10 0.3 0.05 0.2 IF IF GI GI 0.00 0.1 R R 5 10 15 20 5 10 15 20 user sample size (s) user sample size (s) Enabling Networked Knowledge
  • 9.
    Community Act. Fr. DigitalEnterprise Research Institute www.deri.ie One targeted community q=1, Boards−LTM q=1, SAP−LTM 0.5 0.8 0.7 0.4 mean community activation fraction (c) mean community activation fraction (c) 0.6 0.3 0.5 0.4 0.2 0.3 0.1 0.2 IF IF GI GI 0.1 0.0 R R 5 10 15 20 5 10 15 20 user sample size (s) user sample size (s) Enabling Networked Knowledge
  • 10.
    Community Act. Fr. DigitalEnterprise Research Institute www.deri.ie Five targeted communities q=5, Boards−LTM q=5, SAP−LTM 0.5 0.8 0.7 0.4 mean community activation fraction (c) mean community activation fraction (c) 0.6 0.3 0.5 0.4 0.2 0.3 0.1 0.2 IF IF GI GI 0.1 0.0 R R 5 10 15 20 5 10 15 20 user sample size (s) user sample size (s) Enabling Networked Knowledge
  • 11.
    Results Highlights Digital EnterpriseResearch Institute www.deri.ie •  Diffusion process became saturated at approximately 80% of users or communities in Boards, and 30% in SAP •  More efficient to target few communities •  Impact Focus outperformed the other two strategies with respect to both user and community activation fractions, namely for small number of targeted communities (i.e. [1, 2]) and seed users (i.e. [1, 20]) •  Diminishing returns •  For high number of targeted communities and seed users, random strategy outperformed the other two with respect to community activation fractions in SAP data-set •  SAP network fragmented into many small components, which made it hard to reach peripheral communities Enabling Networked Knowledge
  • 12.
    Conclusion Digital Enterprise ResearchInstitute www.deri.ie •  The evaluation demonstrated that the framework •  is able to identify highly influential communities •  can predict which communities to target s.t. the message spreads efficiently over both individual users and communities •  We aim to extend it with content analysis •  E.g. What are the most influential communities with respect to a particular topic? •  We will also investigate empirically-observed topic cascades and modify our models accordingly if needed Enabling Networked Knowledge
  • 13.
    Questions? Digital Enterprise ResearchInstitute www.deri.ie References •  Belák V., Lam S., Hayes C. Cross-Community Influence in Discussion Fora. ICWSM. AAAI, 2012. •  M. Everett and S. Borgatti. The centrality of groups and classes. J. of Mathematical Sociology, 23(3):181–201, 1999. •  D. Kempe, J. Kleinberg, and É. Tardos. Maximizing the spread of influence through a social network. SIGKDD. ACM, 2003. Enabling Networked Knowledge