Jaideep

640 views

Published on

Published in: Technology, Education
0 Comments
1 Like
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total views
640
On SlideShare
0
From Embeds
0
Number of Embeds
2
Actions
Shares
0
Downloads
10
Comments
0
Likes
1
Embeds 0
No embeds

No notes for slide

Jaideep

  1. 1. Social Analytics Mining Behaviors of a Connected World PAKDD School April 11, 2013 Sydney, Ausytralia Jaideep Srivastava University of Minnesota srivasta@cs.umn.edu4/9/2013 University of Minnesota 1
  2. 2. Course Outline• Module 1 • Introduction to Social Analytics – applying data mining to social computing systems; examples of a number of social computing systems, e.g. FaceBook, MMO games, etc.• Module 2 • Computational online trust • Identifying key influencers • Information flow in networks• Module 3 • Analysis of clandestine networks • Katana - game analytics engine 4/9/2013 University of Minnesota 2
  3. 3. Part I: Background
  4. 4. Social Network Analysis O a iz tio a rg n a n l Sc l o ia A th p lo y n ro o g T e ry ho P y h lo y sco g C g itiv on e P rc p n S c -C g itiv e e tio o io o n e K o le g nw de N tw rk e o s N tw rk e o s Ra e lity Sc l o ia K o le g nw de N tw rk e o s N tw rk e o s A q a ta c c u in n e K o le g nw de E id m lo y p e io g (lin s k) (c n n o te t) S c lo y o io g  Social science networks have widespread application in various fields  Most of the analyses techniques have come from Sociology, Statistics and Mathematics  See (Wasserman and Faust, 1994) for a comprehensive introduction to social network analysis12/02/06 IEEE ICDM 2006 4
  5. 5. What have been it‟s key scientific successes? In classical social sciences numerous results  „Six degree of separation‟ [Milgram]  Popularized by the „Kevin Bacon game‟  „The strength of weak ties‟ [Granovetter]  „Online networks as social networks‟ [Wellman, Krackhardt]  „Dunbar Number‟  Various types of centrality measures  Etc. In the Web era  „The Bow-Tie model of the Web‟ [Raghavan]  „Preferential attachment model‟ [Barabasi] (Yes and No)  „Powerlaw of degree distribution‟ [Lots of people] (NO!)  Etc.
  6. 6. Application successes Numerous in social sciences Google – PageRank LinkedIn – expanding your Cognitive Social Network  making you aware that „you‟re more connected and closer than you think you are‟ Expertise discovery in organizations  Knowledge experts, „authorities‟  Well-connected individuals, „hubs‟ Rapid-response teams in emergency management Information flow in organizations Twitter – real time information dissemination Etc.
  7. 7. Online (Multiplayer) Games High How social Low Low High How enagaging
  8. 8. Player Behavior & Revenue Model Blizzard (subscription)  Zynga (free2play)  World of Warcraft  Farmville, Fishville, Mafia  12 million subscribers Wars, etc.  Revenue model  180 million players  $15/month  Revenue model  Approx $3billion annual  Virtual goods revenue  $700 million in 2010  4 hours a day, 7 days a  0.5 hrs a day, 7 days a week week! Hard core gamers Everyone Less socially acceptable More socially acceptable Like Cocaine Like Caffeine
  9. 9. Implications of this „addiction‟ 3 billion hours a week are being spent playing online games  Jane McGonigal in “Reality is Broken” Labor economics  What is the impact of so much labor being removed from the pool [Castranova] Entertainment economics  If MMO players can get 100 hrs/month of entertainment by spending $25 or so, what will happen to other entertainment industries? Psychological/Sociological  Is it an addiction – the prevailing view (Chinese government‟s „detox centers‟ for kids)  Are they fulfilling a deeper need that real world is not (McGonigal) Societal  A trend far too important to not be taken seriously!
  10. 10. Business Example
  11. 11. Levis‟ – Example of Social Retail Levis‟ leverages its brand to ensure customers provide their social network Levis‟ can leverage predictive social analytics technology to understand the value of the customer‟s social network 11
  12. 12. Opportunity, Innovation, Impact  Companies do not understand the social graph of their customers  It‟s not just about how they relate to their customers, but also about how customers relate to each other vs.  Understanding these relationships unlocks immense value  Innovation: Understanding the social network of customers  Key influencers, relationship strength, …  Impact: Deriving actionable insights from this understanding  Customer acquisition, retention, customer care, …  Social recommendation, influence-based marketing, identifying trend-setters, … Ninja Metrics confidential information. Copyright 2012 12
  13. 13. Unlocking true value by product, category, or store 0021 -$128.61 -$293.79 -$79.63 13
  14. 14. True Value of each customer  True value = individual value + social value  Who really matters, and to what degree  Some empirical facts  31% activity due to socialization  23% more individual + 8% more social activity The individual‟s their social and their true lifetime value influence total 14
  15. 15. Impact of New Instrumentation on Science 1950s  Invention of the electron microscope fundamentally changed chemistry from „playing with colored liquids in a lab‟ to „truly understanding what‟s going on‟ 1970s  Invention of gene sequencing fundamentally changed biology from a qualitative field to a quantitative field 1980s  Deployment of the Hubble (and other) Space telescopes has had fundamental impact on astronomy and astrophysics 2000s  Massive adoption is fundamentally changing social science research  Massively Multiplayer Online Games (MMOGs) and Virtual Worlds (VWs) are acting as „macroscopes of human behavior‟
  16. 16. The Virtual World Observatory (VWO) Project• Four PIs, 30+ Post-docs, PhD and MS students, UGs, high-schoolers • Noshir Contractor, Northwestern: Networks • M. Scott Poole, Illinois Urbana-Champaign/NCSA: Groups • Jaideep Srivastava, Minnesota: Computer Science • Dmitri Williams, USC: Social Psychology• Collaborators • Castronova (Sociology, Indiana), Yee (Xerox PARC), Consalvo, Caplan (Economics, Delaware), Burt (Sociology, U of Chicago), Adamic (Info Sci, Michigan), …• Data and technology partners • Sony (EverQuest 2), Linden Labs (2nd Life), Bungie (Halo3), Kingsoft (Chevalier‟s Romance), others … • Cloudera Systems (Hadoop), Microsoft (SQL Server), Weka, …
  17. 17. Overall Goals of the VWO Project Basic Science • behavior, socialization, … • novel, scalable algorithms • NSFSticky SocialMedia tons of Analytics Government applications data • Social science• Games • team dynamics (Army) models • skills acq, leadership (Army)• Virtual Worlds • New algorithms • social influence & adolescent• Other social • Ultra-scalability health (CDC) • etc.apps Business applications • customer churn • game design • anti-social behavior • etc.
  18. 18. Collaborators, Sponsors, Partners Team Financial Sponsors Data Partners Technology Partners
  19. 19. Part II – Impact on Science
  20. 20. Findings from a Player Survey
  21. 21. Who is playing? It is not just a bunch of kids Average age is 31.16 (US population median is 35) More players in their 30s than in their 20s.
  22. 22. How much do they play? Mean is 25.86 hours/week Compares to US mean of 31.5 for TV (Hu et al, 2001) • From prior experimental work, MMO play eats into entertainment TV and going out, not news • So much for kids being the ones with the free time.
  23. 23. Gender Differences More men players (78/22%) Men played to compete , and women played to socialize Men play more other games, but it was the women who were more satisfied EQ2 players  Women: 29.32 hours/week  Men: 25.03 hours/week  Likelihood of quitting: “no plans to quit”: women 48.66%, men 35.08% Self reported play times  Women: 26.03 (3 hours less than actual)  Men: 24.10 (1 hour less than actual)  Boys and girls are socialized early on, and thus have clear role expectations for their behaviors and identities (Gender Role Theory in action!!)
  24. 24. Playing with a partner
  25. 25. Inferring RW gender from VW dataGoal What virtual world behaviors and characteristics predict real world gender?Data: Survey Data n=7119 Survey Character Store EQ2 Character StoreVariables Avatar Characteristics:  Gender, Race, Class, Experience, Guild Rank, Alignment, Archetypes Game play Behaviors:  Total Deaths & Quests, PvP Kills & Deaths, Achievement Points, Number of Characters, Time played, Communication patterns 25
  26. 26. Gender prediction results Close to 95% prediction accuracy  Decision trees work rather well Character Gender, Race and Class are significant predictors to real life gender. Gender swapping is rarer, but systematically different by real gender Players tend to choose:  character gender based on their real life gender  character races that are gendered: women play elves/men play barbarians  classes that are gendered: women play priests/men play fighters 26
  27. 27. Gender swapping behaviorGame Character Real GenderGender Male Female TotalMale 4065 82.6% 98 8.2% 4163 68.0%Female 855 17.4% 1104 91.8% 1959 32.0%Total 4920 100.0% 1202 100.0% 6122 100.0% Observation • Far more males gender swap than females • Why? • Men are more creative? • Women have less identity confusion? • Women get their „fill of gender swapping in real life‟  27
  28. 28. Economics: A test of RW  VWmapping  Do players behave in virtual worlds as we expect them to in the actual world?  Economics is an obvious dimension to test  In the real world, perfect aggregate data are hard to get
  29. 29. GDP and Price Level GDP and price levels are robust but comparatively unstable GDP and Prices on Antonia Bayle 5,000,000 160 4,500,000 140 4,000,000 120 Prices (January = 100) 3,500,000 100 GDP (Gold) 3,000,000 2,500,000 80 2,000,000 60 1,500,000 40 1,000,000 20 500,000 0 0 January February March April May Nominal GDP Price Level (January = 100)
  30. 30. Money Supply and Price Change in Money Supply and Population on Antonia Bayle The instability is 2500 4000 explicable through 3000 Change in Money (000 Gold) 2000 the Quantity Theory 2000 Change in Accounts 1000 of Money 1500 0 -1000  a rapid influx of 1000 -2000 money . . . 500 -3000 -4000 0 -5000 February March April May Change in Money Supply (000 Gold) Change in Active Accounts  . . . dramatically Price Level on Antonia Bayle boosted prices 50 40 Percent Change in Price Level 30 More evidence that 20 this behaves like a 10 real economy 0 -10 February March April May June -20 Price Level
  31. 31. Networks in Virtual Worlds SONIC Advancing the Science of Networks in Communities
  32. 32. Why do we create and sustain networks?  Theories of self-interest  Theories of contagion  Theories of social and  Theories of balance resource exchange  Theories of homophily  Theories of mutual interest  Theories of proximity and collective action  Theories of co-evolutionSources:Contractor, N. S., Wasserman, S. & Faust, K. (2006). Testing multi-theoretical multilevel hypotheses about organizational networks: An analytic framework and empirical example. Academy of Management Review.Monge, P. R. & Contractor, N. S. (2003). Theories of Communication Networks. New York: Oxford University Press. SONIC Advancing the Science of Networks in Communities
  33. 33. “Structural signatures” of Social Theories A A B B F + F + - C - E C E D D Self interest Exchange Balance A A B F B F - + + C E C D E Novice D Expert Collective Action Homophily Contagion SONIC Advancing the Science of Networks in Communities
  34. 34. Black: male Red: femalePartnership Instant messaging SONIC Trade Mail Advancing the Science of Networks in Communities
  35. 35. 4 5 2 0 0 0 (1) 0 0 0 (2) 0 1 0 1 3 0 1 2 0 0 1 0 (n) 0 0 0 Social Networks (1) (2) (n)as network structure frequency Cluster Structurevectors in a bag-of-words model Vectors using Text clustering methodsSocial theory + IR Cluster means provide Attribute values forbased network modes of network each cluster can beanalysis structure configurations used to discover Making up all the trends between network social networks structures and attributes
  36. 36. Results – normalized network structurevector means for all clusters
  37. 37. Results Clusters 1 and 4 are similar  Groups kill fewer monsters  Group members in cluster 4 do not communicate much  Group members in cluster 1 generally limit their communication to just one other person in the group  Most people belong to these two clusters  Consistent with previous research - users in virtual environments are less likely to interact with strangers [N. Ducheneaut, N. Yee, E. Nickell and R. Moore, “Alone Together?” Exploring the social dynamics of massively multiplayer online games, Proceedings CHI06, ACM Press, New York, 407-416.] Cluster 5 groups have many 1-edge and 2-out stars  Most of the communication is one way possibly indicating presence of central people  Maximum number of monsters killed out of all clusters  Performance of the groups is very good  Minimal communication  It is possible that cluster 5 consists of groups more focused on playing and performing well in the game and less on socializing
  38. 38. Some Open Questions How similar/dissimilar are online social networks from real- world social networks? Is online socialization  Only a sustainability activity of real-world networks?  Causing new social networks to be formed?  Is fundamentally a different type of networking activity? A fundamental tenet of socialization has been “geography/proximity drives socialization”  How is this being impacted by online socialization?
  39. 39. TeamSkill: Modeling TeamChemistry in Online Multi- Player Games
  40. 40. Description and motivation The goal of this work is to improve skill assessment approaches used in multiplayer games, especially team-based games Xbox Live, PSN, Steam, and Battle.net have between 149-167 million total users, collectively (and that number is growing)  Many of the most popular games are team-based: Halo, Call of Duty, TF2, CS, etc Why?  Better skill assessment = fairer games = less player attrition  More accurate rankings of players/teams  Most previous work has focused on individuals, not teams It‟s a hard problem  Online setting: Updates to a player‟s skill distribution must be done after each game is played  Applicability: Any generalized assessment technique cannot include game-specific data Our datasets are from the games of professional Halo 3 players 40
  41. 41. What is Halo?  Halo is one of the most well-known “first-person shooter” video game series in the world  Online/LAN-based multiplayer is its most significant component  650,000 to 850,000 unique users per day at its peak 41
  42. 42. Major League Gaming  Our work focuses on professional Halo 3 players  Online scrimmage data, as well as complete tournament data, is readily available from bungie.net and mlgpro.com  Players are highly-skilled individually – allows us to better focus on group-level What is Major League Gaming (MLG)? performance characteristics • A professional league for competitive  Players change teams gaming, including Halo 3 from ‟08-‟10 • Best players in the world compete in this regularly, helps isolate league impact of particular players • Last tournament watched by over on overall team performance 1,000,000 people online  Unexplored boundary case • Our datasets are comprised of games played by these players for skill assessment 42
  43. 43. Skill assessment  An old problem (with different applications in different contexts)  Paired comparison estimation  Foundational work by Thurstone (1927) and Bradley-Terry (1952)  Elo (1959), popularized in chess ranking (FIDE, USCF)  Glicko (1993) – player-level ratings volatility incorporated (σ2), addition of rating periods  TrueSkill (2006) - factor-graph based approach used in Microsoft‟s Xbox Live gaming service  Used to match players/teams up with each other online  Our work focuses on Elo, Glicko, and TrueSkill 43
  44. 44. Elo (1959)  Arpad Elo, Dept of Physics, Marquette University  Was a master chess player in USCF  Proposed the Elo Rating System  Replaced earlier systems of competitive rewards (i.e., tournaments) in USCF/FIDE  Simple to implement: assumes player skill is normally- distributed with constant variance β2  Still widely-used today 44
  45. 45. Glicko (1993)  Mark E. Glickman, Department of Health Policy and Management , Boston University  Proposed the Glicko Rating System  Addresses rating reliability issue through the use of rating periods and player-specific variances  Elo is a special case of Glicko  Iterative approach for approximating the marginal posterior distribution of a player‟s skill conditional on other players‟ priors  More computationally tractable for large datasets 45
  46. 46. TrueSkill (2006)  Ralf Herbrich and Thore Graepel at Microsoft Research, Cambridge, UK  Used for automated ranking/matchmaking on Xbox Live  Uses factor graphs to model multi-player, multi-team environments  Converges quickly (~5 games for a player)  Large-scale deployment  30 million Xbox Live members  150+ games use TrueSkill for ranking & matchmaking 46
  47. 47. Issues with current approaches 1 2 3 4 + 1234  Basic idea  Given: skill ratings of each team member, i.e., si ~ N(μi, σi2)  Sum across all team members  Not intuitive: Team chemistry is a well-known concept in team-based competition [Martens 1987, Yukelson 1997] and it is not captured in any of these models  Can think of it as the overall dynamics of a team resulting from leadership, confidence, relationships, and mutual trust  Independence assumption not realistic in teams, especially at high levels of play 47
  48. 48. More information is available, however… 1 2 3 4 12 13 14 23 24 34 123 124 134 234 1234  Observation: We have more information than just the history of individual players – we also know the histories of groups of players  Player 1‟s history ∩ player 2‟s history  history of {1, 2}  Existing approaches only make use of top row  Idea: Estimate the skills of subgroups of players on a team, combine in some way, and use to produce better estimate of a team‟s skill 48
  49. 49. Reframing the assessmentproblem k=1 1 2 3 4 k=2 12 13 14 23 24 34 k=3 123 124 134 234 k=4 1234  Alterations to existing approaches necessary (i.e., Elo/Glicko/TrueSkill)  Player-level skill representation  generalized subgroup skill representation  For each game and for each k <= size of the team (K), treat each group as you would an individual player and update skill accordingly  Hashing of skill variable matrix according to unordered subgroup membership  Treat Elo/Glicko/TrueSkill as „base learners‟, a la boosting  Each rating says something about the skill of that particular subgroup  …but how do we aggregate these ratings to estimate the skill of a team? 49
  50. 50. TeamSkill  Four different aggregation approaches:  TeamSkill-K  TeamSkill-AllK  TeamSkill-AllK-EV  TeamSkill-AllK-LS 50
  51. 51. Aggregation issues black = history available red = no history available time: t=1 t = 100 t = 200 After 1 game New player – 5 - who‟s New player – 6 – who‟s never played with 1, 2, or 3 played with 3 or 5 (not both)  Real world case: assume that players can leave/join the 4-player team and look at the timeline  The question at each point in time, t: how best to combine the available group ratings to produce a team rating?  The problem: the feature space is expanding and contracting over time. 51
  52. 52. Data set overview  Collected over the course of 2009  7,590 games (2,076 from tournaments and 5,514 from Xbox Live scrimmages)  448 players on 140 different professional and semi- professional teams  Games took place during January 2008 through January 2010  Websites pertaining to this data  http://stats.halofit.org - Player/team statistics  http://halofit.org – Datasets and related information “Friend or foe” social network for all tournaments in 2008 and 2009 52
  53. 53. Evaluation  The TeamSkill approaches were evaluated by predicting the outcomes of games occurring prior to 10 MLG tournaments and comparing their accuracy to unaltered versions (k = 1) of their base learner rating systems - Elo, Glicko, and TrueSkill (for TeamSkill-K, all possible choices of k for teams of 4, 1 ≤ k ≤ 4, were used)  For each tournament, we evaluated each rating approach using:  3 types of training data sets - games consisting only of previous tournament data, games from online scrimmages only, and games of both types.  3 periods of game history - all data except for the data between the test tournament and the one preceding it (“long”), all data between the test tournament and the one preceding it (“recent”), and all data before the test tournament (“complete”).  We will only show “complete” as results for “long” and “recent” mirrored those for “complete”  2 types of games - the full dataset and those games considered “close” (i.e., prior probability of one team winning close to 50%). 53
  54. 54. Results – all games Prediction accuracy for both tournament and scrimmage/custom games using complete history Prediction accuracy for tournament games using complete history Prediction accuracy for scrimmage/custom games using complete history 54
  55. 55. Results – “close” games Prediction accuracy for both tournament and scrimmage/custom games using complete history Prediction accuracy for tournament games using complete history Prediction accuracy for scrimmage/custom games using complete history 55
  56. 56. Conclusions  Modified several existing assessment approaches to operate on generalized player subgroup entities (instead of just individual players)  Introduced four aggregation methods for combining player subgroup information to produce final forecast  Shown evidence that close games are decided on the basis of team chemistry, consistent with sports psychology research 56
  57. 57. Part III: Impact on Business
  58. 58. Player Churn Prediction
  59. 59. Time Equals Stickiness 6.00 5.00 6Ratio of Quitters to Stayers 4.00 3.00 2.00 1.00 0.00 1 3 5 7 9 11 13 15 17 19 21 23 25 27 29 31 33 35 37 39 41 43 45 47 49 51 53 55 57 59 61 63 65 67 69 Character Level  There are less quitters as the levels go up, and focus should be on the first 20 levels.
  60. 60. Solo vs. Social Players 6.00 5.00Ratio of Quitters to Stayers 4.00 3.00 Solo Social 2.00 1.00 0.00 1 3 5 7 9 11 13 15 17 19 21 23 25 27 29 31 33 35 37 39 41 43 45 47 49 51 53 55 57 59 61 63 65 67 69 Character Level Isolated players are 3.5x more likely to quit (B = 1.26, p<.001). Focus design on facilitating social interaction.
  61. 61. Problem Statement & Approach Objective: At time t, for player p, given a window w, compute the probability, P(p,t,w), of p churning within the interval (t,t+w), and the confidence in this probability. Approach: Estimate P(p,t,w) and its confidence using  Data about p‟s activity and socialization behavior  Statistics and machine learning (linear, non-linear, ensemble, etc.)  Use of socio-psychological theories of player motivation  Novel synthesis of data driven (DD) and theory driven (TD) approaches 61
  62. 62. Player Motivation Theories Richard Bartle, 1996 Nick Yee, 200562
  63. 63. Mapping MMO Behaviors to Theory Example MMORPG  Persistent virtual world  Players have avatars  Quest-driven  Participation in groups, guilds Dataset (game logs)  Time period: FEB-JUN, 2006  No. of accounts: ~16000  Churner definition: 2 months of inactivity63
  64. 64. Synthesis of TD and DD approach64
  65. 65. Model Evaluation (Confusion Matrix)  Model performance for different features (10 fold cross-validation) Features Tree size Precision (%) Recall (%) F-measure (nodes) (%) Pure DD 485 69.3 84 76 TD: 59 67.1 76.2 71.3 Achievement+Social TD: Achievement 37 67.2 73.2 70.1 TD: Socialization 7 64.3 55.4 59.5  Observations  Decision tree learning works quite well  DD model very accurate (76%); difficult to interpret (485 nodes)  TD model interpretable (7 – 59 nodes), but less accurate  Achievement-orientation more important than socialization 65
  66. 66. Model Evaluation (Lift Chart) Observations  TD model does better in predicting top quintile of churners  DD model performs better in the 40%-70% range66
  67. 67. Ensemble Approach – ModelEvaluation Ensemble approach  A „committee of classifiers‟ votes on the result  Heterogeneity helps No. of Precision Recall F-measure clusters 5 66.23 91.94 76.99 10 66.49 91.08 76.87 15 67.58 89.80 77.12 20 67.6 90.13 77.25 Improvements over single model  Best recall value shows 7.94% improvement , i.e. model can identify larger proportion of potential churners)  Best F-measure shows 1.25% improvement67
  68. 68. Conclusions Comparison of theory-driven and data-driven model in terms of prediction accuracy and model interpretability Achievement-orientation is more important than socialization-orientation in identifying potential churners Ensemble model can identify larger proportion of potential churners as compared to single global model68
  69. 69. Course Outline• Module 1 • Introduction to Social Analytics – applying data mining to social computing systems; examples of a number of social computing systems, e.g. FaceBook, MMO games, etc.• Module 2 • Computational online trust • Identifying key influencers• Module 3 • Analysis of clandestine networks • Katana - game analytics engine 4/9/2013 University of Minnesota 69
  70. 70. Computational Trust in Multiplayer Online Games
  71. 71. “If you want to go fast walk alone, if youwant to go far then walk with a group.” - Proverb from Ghana
  72. 72. Virtual Worlds & Massive OnlineGames Massively Multiplayer Online Role Playing Games (MMORPGs/MMOs) Simulated Environments like SecondLife Millions of people can interact with one another is shared virtual environment People can engage in a large number of activities with one another and with the environment Many of the observed behaviors have offline analogs 72 72
  73. 73. Big Picture Questions How is trust expressed differently in different social contexts?  Cooperative (PvE), Adversarial (PvP), … How is trust expressed in different types of social networks?  Housing, Mentoring, Trade, Group, … What are the characteristics of trust and related networks in MMOs?  Similarities and differences with social networks in other domains e.g., citation networks, co-authorship networks What role can features derived from the trust network play in prediction tasks e.g., link prediction (formation, breakage, change), trust propensity, success prediction 73
  74. 74. Computational Trust in Multiplayer Online Games Department of Computer Science, University of Minnesota Muhammad Aurangzeb Ahmad Trust as a Multi-Level Network PhenomenonTrust as a multi-modal multi-level Network formulations of Traditional Incorporating social science theories Generative Network Models for Trust based network phenomenon Trust related concepts in trust related prediction task social interactions
  75. 75. Computational Trust in Multiplayer Online Games Department of Computer Science, University of Minnesota Muhammad Aurangzeb Ahmad Trust as a Multi-Level Network Phenomenon Trust as a multi-modal multi-level Network formulations of Traditional Incorporating social science theories Generative Network Models for Trust based network phenomenon Trust related concepts in trust related prediction task social interactions Social Characteristics of TrustEffect of Social Environments on Trust Trust and Homophily Trust and Clandestine Behavior Trust and Mentoring Trust and Trade Most types of homophily do notDifferent Social environments result in carry over to the MMO domain No Honor Amongst Thieves differences in network signatures Trust based and other social networks in MMOs exhibit anomalous network characteristics
  76. 76. Computational Trust in Multiplayer Online Games Department of Computer Science, University of Minnesota Muhammad Aurangzeb Ahmad Trust as a Multi-Level Network Phenomenon Trust as a multi-modal multi-level Network formulations of Traditional Incorporating social science theories Generative Network Models for Trust based network phenomenon Trust related concepts in trust related prediction task social interactions Social Characteristics of TrustEffect of Social Environments on Trust Trust and Homophily Trust and Clandestine Behavior Trust and Mentoring Trust and Trade Most types of homophily do notDifferent Social environments result in carry over to the MMO domain No Honor Amongst Thieves differences in network signatures Trust based and other social networks in MMOs exhibit anomalous network characteristics Trust and Prediction Trust Prediction Family of Problems Item Recommendation Success Prediction (Social Capital) Predict Formation, Change, Breakage of Trust with in the same network and across social networks. Predict trust propensity. Effects of social environments on trust Effect of network structure on trust
  77. 77. Computational Trust in Multiplayer Online Games Department of Computer Science, University of Minnesota Muhammad Aurangzeb Ahmad Trust as a Multi-Level Network Phenomenon Trust as a multi-modal multi-level Network formulations of Traditional Incorporating social science theories Generative Network Models for Trust based network phenomenon Trust related concepts in trust related prediction task social interactions Social Characteristics of TrustEffect of Social Environments on Trust Trust and Homophily Trust and Clandestine Behavior Trust and Mentoring Trust and Trade Most types of homophily do notDifferent Social environments result in carry over to the MMO domain No Honor Amongst Thieves differences in network signatures Trust based and other social networks in MMOs exhibit anomalous network characteristics Trust and Prediction Trust Prediction Family of Problems Item Recommendation Success Prediction (Social Capital) Predict Formation, Change, Breakage of Trust with in the same network and across Social networks. Predict trust propensity. Effects of social environments on trust Effect of network structure on trust
  78. 78. Computational Trust in Multiplayer Online Games Department of Computer Science, University of Minnesota Muhammad Aurangzeb Ahmad Computational Social Science Semantics Structure Trust and Homophily Characteristics of Trust NetworksMost types of homophily do not Trust based and other social networks in MMOscarry over to the MMO domain exhibit anomalous network characteristics Algorithmic Trust Prediction Family of Problems Predict Formation, Change, Breakage of Trust with in the same network and across social networks. Predict trust propensity. Applications Detection of Clandestine Actors No Honor Amongst Thieves Gold Farmer Detection
  79. 79. Computational Trust in Multiplayer Online Games Department of Computer Science, University of Minnesota Muhammad Aurangzeb Ahmad Computational Social Science Semantics Structure Trust and Homophily Characteristics of Trust NetworksMost types of homophily do not Trust based and other social networks in MMOscarry over to the MMO domain exhibit anomalous network characteristics Algorithmic Trust Prediction Family of Problems Predict Formation, Change, Breakage of Trust with in the same network and across social networks. Predict trust propensity. Applications Detection of Clandestine Actors No Honor Amongst Thieves Gold Farmer Detection
  80. 80. Housing-Trust in EQ2 • Access permissions to in-game house as trust relationships • None: Cannot enter house. • Visitor: Can enter the house and can interact with objects in the house. • Friend: Visitor + move items • Trustee: Friend + remove items • Houses can contain also items which allow sales to other characters without exchanging on the market 80
  81. 81. Homophily and Trust Homophily: Birds of a feather flock together There is no one form of homophily and homophily in general is described in multiple ways: Status vs. Value Homophily Each of these homophiles are in turn defined in multiple ways themselves Previous literature instantiates homophily in MMOs in terms of player characteristics and behavior in the game 81
  82. 82. Network Models, Homophily andTrust Networks in MMOs RQ1: Does homophily in MMOs operate in ways similar to homophily in the offline world? RQ2: How do we map characteristics that define homophily in the offline world to online settings? 82
  83. 83. Mapping Homophily in MMOs In general, studies of homophily in MMOs assume only one type of homophily and generalize based on that type Even in the offline world homophily is of different types Hence the necessity of Mapping Homophily which we address here Mapping and Proteus Effect 83
  84. 84. Trust and Homophily in MMOs Homophily Type Hypothesis ObservationH Gender Homophily Players trust other players who ?1 are of the same genderH Age Homophily Players trust other players who ?2 are of the same age cohortsH Class Homophily Players trust other players who ?3 are of the same classH Race Homophily Players trust other players who ?4 are of the same raceH Guild Homophily Players trust other players who ?5 belong to the same guildH Level Homophily Players trust other players who ?6 are at a similar levelH Challenge Players trust other players who ?7 Homophily like similar types of challenges 84
  85. 85. Key ObservationsH1: Players trust other players who are of the same gender? In general players trust other players who are of the same genderH2: Players trust other players who are of similar age? The stronger the type of trust the lesser is the age difference between the people specifying trustH3: Players trust other players who are of the same class? Class does not seem to effect the choice of trusting others 85
  86. 86. Key ObservationsH4: Players trust other players who are of the same race? Race does not seem to effect the choice of trusting othersH5: Players trust other players who are of the same guild? In general, the stronger the type of trust, the greater is the percentage of the people who trust people in their own guildsH6: Players trust other players more who are level at a similar rate? Leveling at the same rate does not seem to greatly effect trust amongst players 86
  87. 87. Key ObservationsH7: Players trust other players who are of the same level? • Level difference seems to have some effect on trust • For Trustee (strongest) and the Visitor (weakest) form of trust, the lower level players are more likely to trust players who are at a higher level Summary: • Homophily is observed for a subset of types in MMOs as compared to what it is observed for in the offline world • The types of homophily which are not observed in MMOs are the ones which are greatly effected by game mechanics 87
  88. 88. Computational Trust in Multiplayer Online Games Department of Computer Science, University of Minnesota Muhammad Aurangzeb Ahmad Computational Social Science Semantics Structure Trust and Homophily Characteristics of Trust NetworksMost types of homophily do not Trust based and other social networks in MMOscarry over to the MMO domain exhibit anomalous network characteristics Algorithmic Trust Prediction Family of Problems Predict Formation, Change, Breakage of Trust with in the same network and across social networks. Predict trust propensity. Applications Detection of Clandestine Actors No Honor Amongst Thieves Gold Farmer Detection
  89. 89. Social Networks: GeneralObservations There is an extensive literature on characteristics of social networks (Leskovec PAKDD 2005, Leskovec ICDM 2005, McGlohon ICDM 2008, McGlohon KDD 2008) The network exhibits monotonically shrinking diameter over time (Leskovec PAKDD 2005, Leskovec ICDM 2005, McGlohon ICDM 2008) At a certain point in time called the Gelling Point many smaller connected connect together and become part of the largest connected component (Leskovec PAKDD 2005, Leskovec ICDM 2005, McGlohon ICDM 2008) The largest connected component (LCC) comprises of the majority of the nodes in the network (>= 80%) (McGlohon ICDM 2008, McGlohon KDD 2008) 89
  90. 90. Social Networks: GeneralObservations The size of the second and the third largest connected components remain constant (more or less) even though the identity of these components change over time (Leskovec PAKDD 2005, Leskovec ICDM 2005, McGlohon ICDM 2008, McGlohon KDD 2008) Network isolates are few in number (<5%) (Leskovec PAKDD 2005, Leskovec ICDM 2005) The number of connected components decreases over time (Leskovec PAKDD 2005, Leskovec ICDM 2005, McGlohon ICDM 2008) Relatively fast growth of LCC close to the gelling point (Leskovec PAKDD 2005, Leskovec ICDM 2005) 90
  91. 91. Trust Networks in MMOs Data from 4 servers is available. Results from one server (Player vs. Environment, „guk‟) are shown The network consists of 15,237 nodes, 30,686 edges and 1,476 connected components Dataset spans from January 2006 to August 2006 Average node degree of 4.03. The size of the three largest connected components are as follows: 9039, 51 and 49. The largest connected component accounts for 59% of all the nodes in the network The Trust Network on „guk‟ on August 31, 2006 91
  92. 92. Key Observations Observation 1: Preferential Attachment: The rich get richer but not too rich Explanation: Social bandwidth is limited, Dunbar Number Observation 2: The growth of the LCC is retarded after the gelling point Explanation: The trust network has a relatively low growth rate as compared to the other networks Observation 3: Non-monotonic change in the diameter of the largest connected component Explanation: Players have different levels of activity at various points in time and can also “drop out” of the network if they churn from the game 92
  93. 93. Key Observations Observation 4: A large number of isolate components are observed (> 1000) Explanation: People join in groups and spend all the time playing with one another instead of interacting with people from the outside Observation 5: The number of isolate components increases monotonically over time Explanation: (Same as observation 4) Observation 6: Nodes in the non-LCC constitute a significant portion of the network (41%, 8 months after gelling point) Explanation: (Same as observation 4) 93
  94. 94. Generative Models of Trust Networks Time bound Preferential Attachment: The rich get rich but not so much after a certain point in time. Edge formation is bound by time Presence of Auxiliary Components: Isolate components are added to the network at an almost constant rate over time 94
  95. 95. Generative Models of Trust Networks(ii) Non-Monotonic Decrease in the Diameter: Nodes become inert after a certain point in time. Sample the lifetime of nodes from a normal distribution Homophily in Edge formation: Probability of edge formation dependent upon node degree as well as agreement (similarity) in node characteristics 95
  96. 96. Results Diameter: Non-Monotonically Changing Diameter % LCC as being relatively small 96
  97. 97. Results Number of Connected Components Network growth and the gelling point 97
  98. 98. Conclusion: Trust Networks Trust Networks in MMOs exhibit many properties which are not exhibited by other social networks in most other domains Proposed a model based on observations and domain knowledge Models of social networks should incorporate the peculiarities which are observed in MMOs in general Generalization? Similar observations have been made for mentoring networks but not for PvP, Trade and Chat Networks 98
  99. 99. Computational Trust in Multiplayer Online Games Department of Computer Science, University of Minnesota Muhammad Aurangzeb Ahmad Computational Social Science Semantics Structure Trust and Homophily Characteristics of Trust NetworksMost types of homophily do not Trust based and other social networks in MMOscarry over to the MMO domain exhibit anomalous network characteristics Algorithmic Trust Prediction Family of Problems Predict Formation, Change, Breakage of Trust with in the same network and across social networks. Predict trust propensity. Applications Detection of Clandestine Actors No Honor Amongst Thieves Gold Farmer Detection
  100. 100. Trust Prediction Family ofProblemsTrust Prediction: Given a trust network G predict which nodes are going to trust one another in the futurePrediction Across Networks: Given a set of actors whoparticipate in multiple types of interactions using features fromone network predict the existence of links in the other network 100
  101. 101. Trust Prediction Family ofProblems Machine Learning/Classification approach to the problem of trust prediction Introduced a set of new problems (inter-network link prediction, trust propensity prediction) Proposed an algorithm for link prediction which used domain knowledge from social science theories New Contributions:  Does the social context (adversarial vs. cooperative) effect prediction results?  If so then what is the implication for generalization? 101
  102. 102. Prediction Task Trust Prediction as a classification problem 60,000 examples for each prediction task 10 Fold Cross-validation Data from Guk (Cooperative) and Nagafen (Adversarial) Servers Six Standard Classifiers for Comparison: J48, JRip, AdaBoost, Bayes Network, Naive Bayes and k-nearest neighbor Positive Example: Negative Example: Training Period Test Period Training Period Test Period 102
  103. 103. Prediction: Group Network In general good prediction results are obtained for the combat network (in either direction), except one case The grouping network is a union of all grouping instances and thus it is extremely dense (1,796,438 edges, 31,900 nodes) 103
  104. 104. Prediction: Group Network Grouping is not a good predictor of mentoring but the vice versa is correct Mentoring is (more often than not is accompanied by grouping) but grouping can be in a variety of contexts e.g., raids, quests, dungeon instances, … 104
  105. 105. Prediction: Combat Network In general good prediction results are obtained for the combat network (in either direction) In general if players have played against another person then they have friends in other networks who have done the same or something similar 105
  106. 106. Comparison across SocialEnvironments T: Trust; M: Mentoring; B: Trade In general the results are similar for both the environments with some exceptions Mentoring-Mentoring: In the adversarial environment a more cliqueish behavior is observed i.e., friends of friends are likely to mentor one another in the future which is not the case for the cooperative environment 106
  107. 107. Comparison across SocialEnvironments Mentoring-Related Tasks: In general mentoring is much more prevalent in the adversarial environment as compared to the cooperative environment (~3 times) and is much more intense. Overlap between mentoring and trade is thus more likely 107
  108. 108. Computational Trust in Multiplayer Online Games Department of Computer Science, University of Minnesota Muhammad Aurangzeb Ahmad Computational Social Science Semantics Structure Trust and Homophily Characteristics of Trust NetworksMost types of homophily do not Trust based and other social networks in MMOscarry over to the MMO domain exhibit anomalous network characteristics Algorithmic Trust Prediction Family of Problems Predict Formation, Change, Breakage of Trust with in the same network and across social networks. Predict trust propensity. Applications Detection of Clandestine Actors No Honor Amongst Thieves Gold Farmer Detection
  109. 109. Trust Amongst Clandestine Actors“A plague upon‟t when thievescannot be true one to another!”– Sir Falstaff, Henry IV, Part 1, II.ii Do gold farmers trust each other? 109
  110. 110. Hypergraphs to Represent Tripartite Graphs • Accounts can have several characters • Houses can be accessed by several characters • Projecting to one- or two-model data obscures crucial information about embededdness and paths • Figure 2a: Can ca31 access the same house as ca11? • Figure 2b: Are characters all owned by same account? 110
  111. 111. Hypergraphs: Key Concepts • Hyperedge: An edge between three or more nodes in a graph. We use three types of nodes: Character, account and house • Node Degree: The number of hyperedges which are connected to a node NDh1 = 3 • Edge Degree: The number of hyperedges that an edge participates in EDa1-h1 = 2 111
  112. 112. Network Characteristics• Long tail distributions are observed for the various degree distributions• The mapping from character-house to an account is always unique• Players who are connected to a large number of houses are highly active players otherwise as well 112
  113. 113. Characteristics of Hypergraph Projection Networks • Account Projection: Majority of the gold farmer nodes are isolates (79%). Affiliates well-connected (8.89) vs. non-affiliates (3.47) • Character Projection: Majority of the gold farmer nodes are isolates (84%). Affiliates well-connected (10.42) vs. non-affiliates (3.23) • House Projection: 521 gold farmer houses. Most are isolates (not shown) but others are part of complex structures. Densely connected network with gold farmers (7.56) and affiliates (84.02) The Housing Projection Network 113
  114. 114. Key Observations • Gold farmers grant trust ties less frequently than either affiliates or general players • Gold farmers grant and receive fewer housing permissions (1.82) than their affiliates (4.03) or general player population (2.73) Total degree In degree Out degree <n> < nGF > < nAff > <n> < nGF > < nAff > <n> < nGF > < nAff > Farmers 1.82 0.29 1.82 0.89 0.29 0.89 1.07 0.29 1.07 Affiliates 4.03 1.28 0.70 1.55 0.75 0.70 2.88 0.63 0.70Non-Affiliates 2.73 - 7.77 1.57 - 5.98 1.56 - 2.34 114
  115. 115. Key Observations • No honor among thieves • Gold farmers also have very low tendency to grant other gold farmers permission (0.29) • Affiliates also unlikely to trust other affiliates (0.70) Total degree In degree Out degree <n> < nGF > < nAff > <n> < nGF > < nAff > <n> < nGF > < nAff > Farmers 1.82 0.29 1.82 0.89 0.29 0.89 1.07 0.29 1.07 Affiliates 4.03 1.28 0.70 1.55 0.75 0.70 2.88 0.63 0.70Non-Affiliates 2.73 - 7.77 1.57 - 5.98 1.56 - 2.34 115
  116. 116. Key Observations • Affiliates are brokers: • Farmers trust affiliates more (1.82) than other farmers (0.29) • Affiliates trust farmers more (1.28) than other affiliates (0.70) • Non-affiliates have a greater tendency to grant permissions to affiliates (7.77) than in general (2.73) Total degree In degree Out degree <n> < nGF > < nAff > <n> < nGF > < nAff > <n> < nGF > < nAff > Farmers 1.82 0.29 1.82 0.89 0.29 0.89 1.07 0.29 1.07 Affiliates 4.03 1.28 0.70 1.55 0.75 0.70 2.88 0.63 0.70Non-Affiliates 2.73 - 7.77 1.57 - 5.98 1.56 - 2.34 116
  117. 117. Frequent Itemset Mining for Frequent Hyper-subgraphs Support of a Hyper-subgraph: Given a sub-hypergraph of size k, subP is the pattern of interest containing the label P, shP is a pattern of the same size as subP and contains the label P, the support is defined as follows: Support of pattern also containing a gold farmer (red) = 5/8 117
  118. 118. Frequent Itemset Mining for Frequent Hyper-subgraphs Confidence of a Hyper-Subgraph: Given a sub-hypergraph of size k, subP is the pattern of interest containing the label P, subG is a pattern which is structurally equivalent but which does not contain the label P, the confidence is defined as follows: Confidence of pattern and containing a gold farmer = 5/7 118
  119. 119. Frequent Patterns of GFs• Very low (s ≤ 0.1) support and confidence for almost all (except 8) frequent patterns with gold farmers• Remaining 8 patterns can be used for discrimination between gold farmers and non-gold farmers in a subset of the instances• Gold farmers & affiliates are more connected: A 3rd of more complex patterns are associated with affiliates (15/44) 119
  120. 120. Application: Gold Farmer Detection (Behavioral) Classification using in-game behavioral and demographic features Each model corresponds to a grouping of different features 120
  121. 121. Application: Gold Farmer Detection (Label Propagation)• Problem: Not all people who are labeled as normal players are such. Some of them are gold farmers but have not been identified as such• Analogue: If someone is socializing almost exclusively with criminals then he may be a criminal Classifier Metric Initial Dataset Label Prop Change In Performance Bayes Net Precision 0.17 0.189 0.019 Recall 0.834 0.819 -0.015 F-Score 0.282 0.307 0.025 J48 Precision 0.494 0.62 0.126 Recall 0.189 0.337 0.148 F-Score 0.273 0.437 0.164 J Rip Precision 0.495 0.537 0.042 Recall 0.462 0.462 0 F-Score 0.478 0.497 0.019 KNN Precision 0.436 0.46 0.024 Recall 0.396 0.428 0.032 F-Score 0.415 0.443 0.028 Logistic Regression Precision 0.455 0.534 0.079 Recall 0.189 0.271 0.082 F-Score 0.267 0.36 0.093 Naïve Bayes Precision 0.146 0.142 -0.004 Recall 0.538 0.502 -0.036 F-Score 0.23 0.221 -0.009 Adaboost w/ DT Precision 0.405 0.471 0.066 Recall 0.105 0.08 -0.025 F-Score 0.167 0.137 -0.03 121
  122. 122. Gold Farmer Detection Model 1 (Player Attribute Based Features): These features are based on the attributes of the player‟s character in the game e.g., character race, character gender, distribution of gaming activities etc. Model 2 (Item Based Features): These are the features which are derived from items bought and sold from the consignment network. These features are based on the frequency of the frequent items sold or bought by gold farmers. Model 3 (Player Attribute & Item Based Features): All the attributes from the previous two models. Model 4 (Item Network Based Features): Features which are derived from the item network in a manner analogous to Model 2. Model 5 (Player Attribute & Item-Network Based Features): A combination of features from Model 1 and Model 4. Model 6 (Item Network & Item-Network Based Features): A combination of features from Model 2 and Model 4. Model 7 (Player Attribute, Item & Item-Network Based Features): Union of all the features described above. 122
  123. 123. Application: Structural Signatures Approach Applied to Trade NetworksNot sufficient data is present to use the structural signature approach to catchgold farmersAlternative: Application of the same approach in other networks: Trade NetworkA set of standard machine learning models are used: Naive Bayes, Bayes Net,Logistic Regression, KNN, J48, JRip, AdaBoost and SMO 123
  124. 124. Conclusion: Gold Farmer Behavior Analysis and Prediction Representation Related Issues addressed with respect to trust in MMOs Application of frequent pattern mining to discover distinct trust patterns associated with gold farmers No honor between thieves: Gold farmers tend not to trust other gold farmers 124
  125. 125. Computational Trust in Multiplayer Online Games Department of Computer Science, University of Minnesota Muhammad Aurangzeb Ahmad Trust as a Multi-Level Network Phenomenon Trust as a multi-modal multi-level Network formulations of Traditional Incorporating social science theories Generative Network Models for Trust based network phenomenon Trust related concepts in trust related prediction task social interactions Social Characteristics of TrustEffect of Social Environments on Trust Trust and Homophily Trust and Clandestine Behavior Trust and Mentoring Trust and Trade Most types of homophily do notDifferent Social environments result in carry over to the MMO domain No Honor Amongst Thieves differences in network signatures Trust based and other social networks in MMOs exhibit anomalous network characteristics Trust and Prediction Trust Prediction Family of Problems Item Recommendation Success Prediction (Social Capital) Predict Formation, Change, Breakage of Trust with in the same network and across social networks. Predict trust propensity. Effects of social environments on trust Effect of network structure on trust
  126. 126. Computational Trust in Multiplayer Online Games Department of Computer Science, University of Minnesota Muhammad Aurangzeb Ahmad Computational Social Science Semantics Structure Trust and Homophily Characteristics of Trust NetworksMost types of homophily do not Trust based and other social networks in MMOscarry over to the MMO domain exhibit anomalous network characteristics Predictive Analysis Trust Prediction Family of Problems Predict Formation, Change, Breakage of Trust with in the same network and across social networks. Predict trust propensity. Applications Detection of Clandestine Actors No Honor Amongst Thieves Gold Farmer Detection
  127. 127. Conclusion Availability of data allows one to analyze phenomenon where it was not possible to do so in the past (Trust in MMOs in the current case) Explored a series of big-picture questions pertaining to trust in MMOs Similar affordances and contexts (with respect to the offline world) lead to similar outcomes in the online world Explored and expanded the scope of trust related prediction tasks Analysis is required on more datasets for generalizability 127
  128. 128. Acknowledgement DMR Lab  Professor Jaideep Srivastava and all DMR lab members especially Zoheb Borbora, Amogh Mahapatra, Young Ae Kim, Nishith Pathak, Kyong Jin Shim and Nisheeth Srivastava Virtual Worlds Observatory  Professor Noshir Contractor (Northwestern), Professor Scott Poole (UIUC), Professor Dmitri Williams (USC)  External Collaborators at Northwestern, UIUC, USC, U Toronto especially Brian Keegan Funding Agencies:  NSF, AFRL, NSCTA, IARPA 128
  129. 129. Publication SummaryPublication Type Total Thesis Related CommentBook 1 1 Book on Clandestine Behaviors and Networks (Springer 2012)Conference 14 5PapersBook Chapters 1 -Journal Papers 2 -Workshop Papers 10 3 2 Best Paper AwardsShort/Poster 8 1PapersTechnical Reports 4 1Tutorials 4 127 Co-authorsPatents 2 1 129
  130. 130. A Computational Model for Social Influence
  131. 131. Objective: Model socialinfluence in a dynamic networksetting as a spread of cascadesto identify key influential nodes 131
  132. 132. Outline Team Overview Optimal Target Selection Current approaches Proposed Approach Results Update summary Q&A 132
  133. 133. Motivation Original Retweet Global retweets of Tweets coming from Japan for one hour after the earthquake 133 Courtesy:http://blog.twitter.com/2011/06/global-pulse.html
  134. 134. Optimal Target People Selection Goal: Find the optimal targets to maximize influence spread in network Why is it important?  Public Polls: Sentiment spread  Sales and Marketing: Word-of-mouth spread  Public Health: Disease spread Influence: A force that attempts to change the opinion or behavior of the individual Influence is causal  My friend buys a product -> I buy a product 134
  135. 135. Current Approaches (1/2) Assume a certain model for propagation of influence [1]  Independent Cascade Model  Each node is independently influences the neighbor using a biased coin toss  Linear Threshold Model  Each node picks a random threshold  Each node infects the neighbor by a specified amount All of them need to know the propagation probability 135
  136. 136. Current Approaches (1/2) Use a two step approach  Step 1: Choose influence Model  Step 2: Find the subset of top-k nodes that maximizes the influence Bad News: Step 2 is NP-Hard [2] Popular method for step 2: Greedy Heuristics  Greedy heuristics gives (1-1/e) approximation Address scalability issues in the second step [3][4][5][6]  CELF [3]: Influence maximization function follows law of diminishing returns (sub-modular) 136
  137. 137. Limitations of current approaches Estimation/Learning of propagation probabilities  Data-driven and model free approach Propagation models are not content unaware  Content-aware influencer mining No causality effect in influence model  Add causal effect in influence propagation 137
  138. 138. Proposed ApproachCommunication Logs Frequent Network Sequence Discovery of Mining Influencers Ordered list of influencers Network Structure 138
  139. 139. Sequence Cascade Mining Example Temporal order Append neighbors from network i=2 Check downward closure and support countPeter, Charles, Kim, Vicky, Nancy Network structure Let Support = 2 139
  140. 140. Sequence Cascade Mining Example Append neighbors from network i=3 Check downward closure and support countPeter, Charles, Kim, Vicky, Nancy Network structure Let Support = 2 140
  141. 141. Sequence Cascade Mining Example Append neighbors from network i=4 Break loop; No more work to do…Peter, Charles, Kim, Vicky, Nancy Network structure Return Frequent Sequences Let Support = 2 141
  142. 142. Sequence Cascade Mining L1 Frequent Sequence Mining Candidate Generation for k+1 By appending neighbors Downward closure pruning Support Count and reorganize 142
  143. 143. Influence Function Node pattern Influence Set (Q(i,p)) = Influence Set of a Node Influence Set of a Node Set It can be shown that influence function I(V,s) is monotone and sub-modular 143
  144. 144. Network Discovery of Influencers Greedy heuristic with (1-1/e) approximation guarantee 144
  145. 145. Network Discovery of InfluencersFrequent Sequences Example i=1 Choose the node with max out-degree Find top-3 (Randomly break tie between P influencers and C) P C chosen Current Set of Influenced People C N C K P K V 145

×