A Network Pruning Based Approach forSubset-Specific Influential DetectionPraphul Chandra, Arun KalyanasundaramHewlett Packar...
Can they really influence our decisions?
Can they really influence our decisions?
Who else can influence our decisions?
Who else can influence our decisions?
Who else can influence our decisions?
How do we exploit this spread of influence?
How do we exploit this spread of influence?Viral Marketing
How do we exploit this spread of influence?Viral MarketingInfluential Detection• Identify a set of nodes (or individuals) to...
Influential Detection - Other ApplicationsWater Distribution Networks [Leskovec, et al. 2007][2]
Influential Detection - Other ApplicationsWater Distribution Networks [Leskovec, et al. 2007][2]Preventing the spread of di...
Influential Detection - A simple heuristicb acdefgMost InfluentialFinding the most influential node using the highest degree ...
Our Problem - Subset Specific Influential Detection• Aim : Maximize the spread of influence on a subset of nodes inthe networ...
Our Problem - Subset Specific Influential Detection• Aim : Maximize the spread of influence on a subset of nodes inthe networ...
Subset Specific Influential Detection - ExamplesSmall Businesses - Locality based marketing Political Campaign[Focus on Supp...
Subset Specific Influential Detection - Our Motivation• Increase in size / density of networks.• Opportunity to improve the ...
Subset Specific Influential Detection - A simple heuristicb acdefgSubset of nodes to maximize influence spreadSubset Specific ...
Our Contribution - A Summary• An efficient algorithm for subset specific top-k influential detection.• Performance vs. efficienc...
Background - Models of Information Diffusion• Aim: Capture the dynamics of diffusion in social networks.[Granovetter, Mark 1...
Background - Models of Information Diffusion• Aim: Capture the dynamics of diffusion in social networks.[Granovetter, Mark 1...
Independent Cascade Model - Activation Graphsabcdefg0.10.20.010.10.050.30.15abcdefgabcdefgActivation Graph 1 Activation Gr...
Evaluating Influence Spread In ICM [Kempe, et al. 2003]• Expected influence spread due to a node u :• Mean number of nodes r...
Previous Work - Greedy Algorithm [Kempe, et al. 2003]• σ(A): Influence spread, due to a seed set A.• δu: Marginal contribut...
Greedy Algorithm - Pictorial RepresentationuabcdfveyxuabcdfveyxTop-k InfluentialIteration 1Node v chosen as the most influen...
Greedy Algorithm - Pictorial RepresentationuabcdfveyxuabcdfveyxTop-k InfluentialIteration 1Node v chosen as the most influen...
Our Approach
Problem StatementGiven a graph, G(V , E) and a destination set D0 ⊆ V , find the top-k nodes inV which maximize the spread ...
Trivial Extension - Subset Adapted Greedy• Expected influence spread on D0 due to a node u :• Mean number of nodes in D0 re...
Iterative Pruning Approach - Central IdeaCentral Idea• Identify a set of nodes, ψ which are considered “influenced”.• De-pr...
When to consider a node as influenced
When to consider a node as influenced• Based on a node’s susceptibility to influence.
When to consider a node as influenced• Based on a node’s susceptibility to influence.For Example : [S. Aral, D. Walker 2011]...
When to consider a node as influenced• Based on a node’s susceptibility to influence.For Example : [S. Aral, D. Walker 2011]...
Iterative Pruning Approach - In Three Steps1. Compute Lu(A) ∈ [0, 1] : Likelihood that a node u would be influenced dueto a...
Iterative Pruning Approach - In Three Steps1. Compute Lu(A) ∈ [0, 1] : Likelihood that a node u would be influenced dueto a...
Iterative Pruning Approach - In Three Steps1. Compute Lu(A) ∈ [0, 1] : Likelihood that a node u would be influenced dueto a...
Iterative Pruning Approach - In Three Steps1. Compute Lu(A) ∈ [0, 1] : Likelihood that a node u would be influenced dueto a...
Iterative Pruning Approach - Pruning Process• ψ : The set of nodes considered influenced.• Two level pruning process:1. For...
Experiments• Datasets: Two real world co-authorship networks1. High Energy Physics - Theory (HEPT) section of e-print arXi...
Results: [ Dataset 1 ] Dense Network• Iterative Pruning (γ = p4) vs. Subset Adapted Greedy:• 96% improvement in efficiency.•...
Results: [ Dataset 2 ] Sparse Network• Iterative Pruning (γ = p4) vs. Subset Adapted Greedy:• 73% improvement in efficiency....
Key Inferences• Low values of γ are highly efficient but at the cost of performanceloss.• Choose a low value of γ for dense ...
Analytical Framework• Known:Influence spread function σ(A) is sub-modular when the underlying graphG(V , E) is static acros...
Analytical Framework• Known:Influence spread function σ(A) is sub-modular when the underlying graphG(V , E) is static acros...
Analytical Framework• Known:Influence spread function σ(A) is sub-modular when the underlying graphG(V , E) is static acros...
Analytical Framework• Known:Influence spread function σ(A) is sub-modular when the underlying graphG(V , E) is static acros...
Analytical Framework• Known:Influence spread function σ(A) is sub-modular when the underlying graphG(V , E) is static acros...
Summary• Iterative network pruning algorithm for subset specific top-k influentialdetection.• Evaluation of our algorithm on...
Scope for Future Work• Design of more efficient algorithms.• Evaluation with real world distributions of γ (susceptibility)....
References[1] P. Domingos and M. Richardson, “Mining the network value of customers,” in Proceedings of the seventhACM SIG...
Questions?P. Chandra and A. Kalyanasundaram, “A Network Pruning Based Approachfor Subset Specific Influential Detection”, in...
Thank You
A network pruning based approach for subset specific influential detection
Upcoming SlideShare
Loading in …5
×

A network pruning based approach for subset specific influential detection

282 views
150 views

Published on

Presented at the ACM Web Science conference, in June 2012, held at Northwestern University, Evanston.

Published in: Technology
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total views
282
On SlideShare
0
From Embeds
0
Number of Embeds
0
Actions
Shares
0
Downloads
6
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide

A network pruning based approach for subset specific influential detection

  1. 1. A Network Pruning Based Approach forSubset-Specific Influential DetectionPraphul Chandra, Arun KalyanasundaramHewlett Packard Labs, Bangalore, IndiaACM Web Science 2012
  2. 2. Can they really influence our decisions?
  3. 3. Can they really influence our decisions?
  4. 4. Who else can influence our decisions?
  5. 5. Who else can influence our decisions?
  6. 6. Who else can influence our decisions?
  7. 7. How do we exploit this spread of influence?
  8. 8. How do we exploit this spread of influence?Viral Marketing
  9. 9. How do we exploit this spread of influence?Viral MarketingInfluential Detection• Identify a set of nodes (or individuals) to seed with someinformation so as to maximize the spread of the seededinformation in the network. [Domingos, et al. 2001][1]
  10. 10. Influential Detection - Other ApplicationsWater Distribution Networks [Leskovec, et al. 2007][2]
  11. 11. Influential Detection - Other ApplicationsWater Distribution Networks [Leskovec, et al. 2007][2]Preventing the spread of diseases [Christakis, Fowler 2007][3]
  12. 12. Influential Detection - A simple heuristicb acdefgMost InfluentialFinding the most influential node using the highest degree heuristic.
  13. 13. Our Problem - Subset Specific Influential Detection• Aim : Maximize the spread of influence on a subset of nodes inthe network instead of the whole network.
  14. 14. Our Problem - Subset Specific Influential Detection• Aim : Maximize the spread of influence on a subset of nodes inthe network instead of the whole network.
  15. 15. Subset Specific Influential Detection - ExamplesSmall Businesses - Locality based marketing Political Campaign[Focus on Supporters / Detractors]Targeted advertisements - Demographics[Nationality, Age, Gender, etc.]
  16. 16. Subset Specific Influential Detection - Our Motivation• Increase in size / density of networks.• Opportunity to improve the efficiency of traditional approaches.• Current state of the art “adapts” existing algorithms on influentialdetection to the subset specific version. [Kempe, et al. 2003][4][Aggarwal, et al. 2011][5]• We address the subset specific top-k influential detection problemstandalone.
  17. 17. Subset Specific Influential Detection - A simple heuristicb acdefgSubset of nodes to maximize influence spreadSubset Specific Most InfluentialFinding the subset specific influential using the highest relevant degree heuristic.
  18. 18. Our Contribution - A Summary• An efficient algorithm for subset specific top-k influential detection.• Performance vs. efficiency trade-off using a tunable parameter - γ.• Analytical framework: For an iteratively pruned network.• A lower bound to evaluate the influence spread.• Proof of sub-modularity of the influence spread function.
  19. 19. Background - Models of Information Diffusion• Aim: Capture the dynamics of diffusion in social networks.[Granovetter, Mark 1978][6]• For Example : Independent Cascade Model (ICM) [Goldenberg,et al. 2001][7]• Node u activates its neighbor v with an independent probability, puv .• Stochastic.• In general puv = p, the propagation probability.
  20. 20. Background - Models of Information Diffusion• Aim: Capture the dynamics of diffusion in social networks.[Granovetter, Mark 1978][6]• For Example : Independent Cascade Model (ICM) [Goldenberg,et al. 2001][7]• Node u activates its neighbor v with an independent probability, puv .• Stochastic.• In general puv = p, the propagation probability.Activation of a node v by a node u can be seen as the outcome of a coin flip with bias puv
  21. 21. Independent Cascade Model - Activation Graphsabcdefg0.10.20.010.10.050.30.15abcdefgabcdefgActivation Graph 1 Activation Graph 2• Activation Graph• Generated by sampling edges based on puv (edge weight).• Allows us to evaluate the expected influence spread [Kempe, et al.2003].
  22. 22. Evaluating Influence Spread In ICM [Kempe, et al. 2003]• Expected influence spread due to a node u :• Mean number of nodes reachable from u in N activation graphs.abcdefg0.10.20.010.10.050.30.15..abcdefg Ra = 3Rb = 3Rc = 3Rd = 3Re = 0Rf = 1Rg = 1abcdefg Ra = 2Rb = 2Rc = 0Rd = 2Re = 2Rf = 2Rg = 2..abcdefg0.10.20.010.10.050.30.15Most InfluentialActivation graph 1Activation graph NN Outcomes}Ru : Number of nodes reachable from u, not including u.
  23. 23. Previous Work - Greedy Algorithm [Kempe, et al. 2003]• σ(A): Influence spread, due to a seed set A.• δu: Marginal contribution of u, which is σ(A ∪ {u}) − σ(A)• Approach : Iteratively choose a node u with highest δu.• Performance guarantee : 63% of optimal solution.• Running time scales exponentially with network size.
  24. 24. Greedy Algorithm - Pictorial RepresentationuabcdfveyxuabcdfveyxTop-k InfluentialIteration 1Node v chosen as the most influential node. Since, δv > δu > δa > ...
  25. 25. Greedy Algorithm - Pictorial RepresentationuabcdfveyxuabcdfveyxTop-k InfluentialIteration 1Node v chosen as the most influential node. Since, δv > δu > δa > ...uabcdfvewxuabcdfvewxIteration 2After Iteration 1, δu drops below δa. Hence a is chosen next.
  26. 26. Our Approach
  27. 27. Problem StatementGiven a graph, G(V , E) and a destination set D0 ⊆ V , find the top-k nodes inV which maximize the spread of influence on D0.b acdefgDestination Set (D0)b acdefg[ Subset specific most influential does NOT lie in D0 ][ Subset specific most influential does lie in D0 ]Salient features:• Top-k nodes may or may not be in D0.• When D0 = V , it reduces to the general form.
  28. 28. Trivial Extension - Subset Adapted Greedy• Expected influence spread on D0 due to a node u :• Mean number of nodes in D0 reachable from u in N activation graphs.abcdefg0.10.20.010.10.050.30.15..abcdefg Ra = 2Rb = 2Rc = 2Rd = 3Re = 0Rf = 0Rg = 0abcdefg Ra = 1Rb = 1Rc = 0Rd = 2Re = 0Rf = 0Rg = 0..abcdefg0.10.20.010.10.050.30.15Destination Set (D0)Subset Specific Most InfluentialActivation graph 1Activation graph NN Outcomes}Ru : Number of nodes in D0 reachable from u, not including u
  29. 29. Iterative Pruning Approach - Central IdeaCentral Idea• Identify a set of nodes, ψ which are considered “influenced”.• De-prioritize the spread of influence to all nodes in ψ.
  30. 30. When to consider a node as influenced
  31. 31. When to consider a node as influenced• Based on a node’s susceptibility to influence.
  32. 32. When to consider a node as influenced• Based on a node’s susceptibility to influence.For Example : [S. Aral, D. Walker 2011][8]
  33. 33. When to consider a node as influenced• Based on a node’s susceptibility to influence.For Example : [S. Aral, D. Walker 2011][8]• In our approach, we introduce a threshold parameter γu to modelthe susceptibility of a node u.
  34. 34. Iterative Pruning Approach - In Three Steps1. Compute Lu(A) ∈ [0, 1] : Likelihood that a node u would be influenced dueto a seed set, A.• Lu(A) is the expectation that a node u will be active due to A.Expected Influence Spread (σ(A)) =u∈VLu(A)
  35. 35. Iterative Pruning Approach - In Three Steps1. Compute Lu(A) ∈ [0, 1] : Likelihood that a node u would be influenced dueto a seed set, A.• Lu(A) is the expectation that a node u will be active due to A.Expected Influence Spread (σ(A)) =u∈VLu(A)2. Set a threshold γu : Add a node u to ψ, when Lu ≥ γu.• Sociological perspective of γ : Susceptibility or Ease of Influencing.• Incorporates potential influence that can reach from all over the network.
  36. 36. Iterative Pruning Approach - In Three Steps1. Compute Lu(A) ∈ [0, 1] : Likelihood that a node u would be influenced dueto a seed set, A.• Lu(A) is the expectation that a node u will be active due to A.Expected Influence Spread (σ(A)) =u∈VLu(A)2. Set a threshold γu : Add a node u to ψ, when Lu ≥ γu.• Sociological perspective of γ : Susceptibility or Ease of Influencing.• Incorporates potential influence that can reach from all over the network.abcdefg0.10.20.010.10.050.30.15Destination Set (D0)Subset Specific InfluentialInfluenced set (ψ)abcdefg0.10.20.010.10.050.30.15La = 0.05γa = 0.05Lb = 0.25γb = 0.2Lc = 0.15γc = 0.2
  37. 37. Iterative Pruning Approach - In Three Steps1. Compute Lu(A) ∈ [0, 1] : Likelihood that a node u would be influenced dueto a seed set, A.• Lu(A) is the expectation that a node u will be active due to A.Expected Influence Spread (σ(A)) =u∈VLu(A)2. Set a threshold γu : Add a node u to ψ, when Lu ≥ γu.• Sociological perspective of γ : Susceptibility or Ease of Influencing.• Incorporates potential influence that can reach from all over the network.abcdefg0.10.20.010.10.050.30.15Destination Set (D0)Subset Specific InfluentialInfluenced set (ψ)abcdefg0.10.20.010.10.050.30.15La = 0.05γa = 0.05Lb = 0.25γb = 0.2Lc = 0.15γc = 0.23. Pruning Process : Remove all paths that lead ONLY to nodes in ψ.• Significantly improves the efficiency. Details to follow.
  38. 38. Iterative Pruning Approach - Pruning Process• ψ : The set of nodes considered influenced.• Two level pruning process:1. For each node in ψ, remove all its adjacent edges.2. Recursively remove all paths that do NOT lead to any node in D0 ψ.• How does pruning help?• Improves efficiency by reducing the density of the underlying graph.Destination Set (D0)Subset Specific InfluentialInfluenced set (ψ)abcdefgabcdefgabcdefgLevel 1Level 2
  39. 39. Experiments• Datasets: Two real world co-authorship networks1. High Energy Physics - Theory (HEPT) section of e-print arXivDense network: 15233 nodes / 58891 edges2. Conference on Software Maintenance and Re-engineering (SMRE)Sparse network : 1336 nodes / 2200 edges• Comparison with state of the art:• Subset Adapted Greedy• Subset Adapted CELF (Cost Effective Lazy Forward) [Leskovec, etal. 2007]• System parameter - γ : {p4, p3, p2, p, 2p, 4p}where p is the propagation probability in ICM.
  40. 40. Results: [ Dataset 1 ] Dense Network• Iterative Pruning (γ = p4) vs. Subset Adapted Greedy:• 96% improvement in efficiency.• 10% drop in performance (influence spread).• Iterative Pruning with CELF (γ = p4) vs. Subset Adapted CELF:• 52% improvement in efficiency.• 10% drop in performance.
  41. 41. Results: [ Dataset 2 ] Sparse Network• Iterative Pruning (γ = p4) vs. Subset Adapted Greedy:• 73% improvement in efficiency.• 21% drop in performance (influence spread).• Iterative Pruning with CELF (γ = p4) vs. Subset Adapted CELF:• 38% improvement in efficiency.• 21% drop in performance.
  42. 42. Key Inferences• Low values of γ are highly efficient but at the cost of performanceloss.• Choose a low value of γ for dense networks and a high value of γ forsparse networks, in order to achieve a desirable performance.• The relatively low efficiency gains with CELF is because the pruningoperation causes a simultaneous reduction in marginal contributionof several nodes.
  43. 43. Analytical Framework• Known:Influence spread function σ(A) is sub-modular when the underlying graphG(V , E) is static across iterations. [Kempe, et al. 2003]
  44. 44. Analytical Framework• Known:Influence spread function σ(A) is sub-modular when the underlying graphG(V , E) is static across iterations. [Kempe, et al. 2003]• Is σ<Gi >(A) sub-modular when the underlying graph Gi (V , Ei ) isiteratively pruned? Where Gi is the graph after ithiteration.
  45. 45. Analytical Framework• Known:Influence spread function σ(A) is sub-modular when the underlying graphG(V , E) is static across iterations. [Kempe, et al. 2003]• Is σ<Gi >(A) sub-modular when the underlying graph Gi (V , Ei ) isiteratively pruned? Where Gi is the graph after ithiteration.Yes. Details in our paper.
  46. 46. Analytical Framework• Known:Influence spread function σ(A) is sub-modular when the underlying graphG(V , E) is static across iterations. [Kempe, et al. 2003]• Is σ<Gi >(A) sub-modular when the underlying graph Gi (V , Ei ) isiteratively pruned? Where Gi is the graph after ithiteration.Yes. Details in our paper.• Can we estimate the σ(A) from σ<Gi >(A)?
  47. 47. Analytical Framework• Known:Influence spread function σ(A) is sub-modular when the underlying graphG(V , E) is static across iterations. [Kempe, et al. 2003]• Is σ<Gi >(A) sub-modular when the underlying graph Gi (V , Ei ) isiteratively pruned? Where Gi is the graph after ithiteration.Yes. Details in our paper.• Can we estimate the σ(A) from σ<Gi >(A)?No, but we derive the following lower bound.σ(A) ≥ σ<Gi >(A) +i−1j=1 u∈ψj ψj+1Lu(A)where ψj is the set of influenced nodes after jthiteration.
  48. 48. Summary• Iterative network pruning algorithm for subset specific top-k influentialdetection.• Evaluation of our algorithm on two real world datasets showed significantefficiency gains with an acceptable drop in performance.• A tunable parameter γ for performance vs. efficiency trade-off.• Analytical framework to show the sub-modularity of influence spreadfunction when the underlying graph is iteratively pruned thus enablingthe evaluation of performance guarantees.
  49. 49. Scope for Future Work• Design of more efficient algorithms.• Evaluation with real world distributions of γ (susceptibility).• Extension to non-progressive models of diffusion.
  50. 50. References[1] P. Domingos and M. Richardson, “Mining the network value of customers,” in Proceedings of the seventhACM SIGKDD international conference on Knowledge discovery and data mining, ser. KDD ’01. ACM, 2001,pp. 57–66. [Online]. Available: http://doi.acm.org/10.1145/502512.502525[2] J. Leskovec, A. Krause, C. Guestrin, C. Faloutsos, J. VanBriesen, and N. Glance, “Cost-effective outbreakdetection in networks,” in Proceedings of the thirteenth ACM SIGKDD international conference on Knowledgediscovery and data mining, ser. KDD ’07. ACM, 2007, pp. 420–429. [Online]. Available:http://doi.acm.org/10.1145/1281192.1281239[3] N. A. Christakis and J. H. Fowler, “The spread of obesity in a large social network over 32 years,” The NewEngland Journal of Medicine, vol. 357, no. 4, pp. 370–379, July 2007. [Online]. Available:http://health-equity.pitt.edu/767/[4] D. Kempe, J. Kleinberg, and E. Tardos, “Maximizing the spread of influence through a social network,” inProceedings of the ninth ACM SIGKDD international conference on Knowledge discovery and data mining, ser.KDD ’03. ACM, 2003, pp. 137–146. [Online]. Available: http://doi.acm.org/10.1145/956750.956769[5] C. C. Aggarwal, A. Khan, and X. Yan, “On flow authority discovery in social networks,” in Proceedings of theeleventh SIAM international conference on Data mining, ser. SDM ’11. SIAM / Omnipress, 2011, pp.522–533.[6] M. Granovetter, “Threshold Models of Collective Behavior,” American Journal of Sociology, vol. 83, no. 6, pp.1420–1443, 1978. [Online]. Available: http://dx.doi.org/10.2307/2778111[7] J. Goldenberg, B. Libai, and E. Muller, “Talk of the Network: A Complex Systems Look at the UnderlyingProcess of Word-of-Mouth,” Marketing Letters, vol. 3, no. 12, pp. 211–223, Aug. 2001. [Online]. Available:http://www.ingentaconnect.com/content/klu/mark/2001/00000012/00000003/00350022[8] S. Aral and D. Walker, “Creating Social Contagion Through Viral Product Design: A Randomized Trial of
  51. 51. Questions?P. Chandra and A. Kalyanasundaram, “A Network Pruning Based Approachfor Subset Specific Influential Detection”, in 4th Annual ACM conference onWeb Science (WebSci 2012), Evanston, Illinois, USA, Jun. 2012.abcdefgDestination Set (D0)Subset Specific InfluentialInfluenced set (ψ)abcdefgabcdefgabcdefg
  52. 52. Thank You

×