Social Analytics
  Mining Behaviors of a Connected
                            World
                   PAKDD School
                             April 11, 2013
                         Sydney, Ausytralia


                                Jaideep Srivastava
                            University of Minnesota
                             srivasta@cs.umn.edu

4/9/2013         University of Minnesota              1
Course Outline
• Module 1
   •        Introduction to Social Analytics – applying data mining to social computing
            systems; examples of a number of social computing systems, e.g.
            FaceBook, MMO games, etc.
• Module 2
   •        Computational online trust
   •        Identifying key influencers
   •        Information flow in networks
• Module 3
   •        Analysis of clandestine networks
   •        Katana - game analytics engine




 4/9/2013                                  University of Minnesota                        2
Part I: Background
Social Network Analysis
                              O a iz tio a
                               rg n a n l                      Sc l
                                                                o ia
               A th p lo y
                n ro o g         T e ry
                                  ho                         P y h lo y
                                                              sco g


                                                 C g itiv
                                                  on e
                P rc p n S c -C g itiv
                 e e tio  o io o n e             K o le g
                                                  nw de
                           N tw rk
                             e o s               N tw rk
                                                  e o s


                  Ra
                   e lity       Sc l
                                 o ia            K o le g
                                                  nw de
                               N tw rk
                                e o s            N tw rk
                                                  e o s

                             A q a ta c
                              c u in n e         K o le g
                                                  nw de
              E id m lo y
               p e io g         (lin s
                                    k)           (c n n
                                                   o te t)

                                          S c lo y
                                           o io g


     Social science networks have widespread application in various
      fields
     Most of the analyses techniques have come from Sociology,
      Statistics and Mathematics
     See (Wasserman and Faust, 1994) for a comprehensive introduction
      to social network analysis
12/02/06                            IEEE ICDM 2006                        4
What have been it‟s key scientific successes?
   In classical social sciences numerous results
       „Six degree of separation‟ [Milgram]
         Popularized by the „Kevin Bacon game‟

       „The strength of weak ties‟ [Granovetter]
       „Online networks as social networks‟ [Wellman, Krackhardt]
       „Dunbar Number‟
       Various types of centrality measures
       Etc.
   In the Web era
       „The Bow-Tie model of the Web‟ [Raghavan]
       „Preferential attachment model‟ [Barabasi] (Yes and No)
       „Powerlaw of degree distribution‟ [Lots of people] (NO!)
       Etc.
Application successes
   Numerous in social sciences
   Google – PageRank
   LinkedIn – expanding your Cognitive Social Network
       making you aware that „you‟re more connected and closer than you
        think you are‟
   Expertise discovery in organizations
       Knowledge experts, „authorities‟
       Well-connected individuals, „hubs‟
   Rapid-response teams in emergency management
   Information flow in organizations
   Twitter – real time information dissemination
   Etc.
Online (Multiplayer) Games

     High




  How
  social




     Low
            Low                   High
                  How enagaging
Player Behavior & Revenue Model
   Blizzard (subscription)              Zynga (free2play)
       World of Warcraft                    Farmville, Fishville, Mafia
       12 million subscribers                Wars, etc.
       Revenue model                        180 million players
           $15/month                        Revenue model
           Approx $3billion annual               Virtual goods
            revenue                               $700 million in 2010
       4 hours a day, 7 days a              0.5 hrs a day, 7 days a week
        week!

        Hard core gamers                          Everyone

        Less socially acceptable                  More socially acceptable

        Like Cocaine                              Like Caffeine
Implications of this „addiction‟
   3 billion hours a week are being spent playing online
    games
       Jane McGonigal in “Reality is Broken”
   Labor economics
       What is the impact of so much labor being removed from the pool
        [Castranova]
   Entertainment economics
       If MMO players can get 100 hrs/month of entertainment by spending
        $25 or so, what will happen to other entertainment industries?
   Psychological/Sociological
       Is it an addiction – the prevailing view (Chinese government‟s „detox
        centers‟ for kids)
       Are they fulfilling a deeper need that real world is not (McGonigal)
   Societal
       A trend far too important to not be taken seriously!
Business Example
Levis‟ – Example of Social Retail




   Levis‟ leverages its brand to ensure customers provide their social
    network
   Levis‟ can leverage predictive social analytics technology to understand
    the value of the customer‟s social network

                                                                         11
Opportunity, Innovation, Impact
     Companies do not understand the social graph of their customers
     It‟s not just about how they relate to their customers, but also about
      how customers relate to each other



                                                vs.




     Understanding these relationships unlocks immense value
         Innovation: Understanding the social network of customers
             Key influencers, relationship strength, …
         Impact: Deriving actionable insights from this understanding
             Customer acquisition, retention, customer care, …
             Social recommendation, influence-based marketing, identifying trend-setters, …

                                    Ninja Metrics confidential information. Copyright 2012     12
Unlocking true value by product, category, or store

                                  0021




                -$128.61




                     -$293.79




                                                  -$79.63




                                                            13
True Value of each customer
     True value = individual value + social value
     Who really matters, and to what degree
     Some empirical facts
         31% activity due to socialization
         23% more individual + 8% more social activity

      The individual‟s         their social               and their true
      lifetime value           influence                  total




                                                                           14
Impact of New Instrumentation on Science
   1950s
       Invention of the electron microscope fundamentally changed
        chemistry from „playing with colored liquids in a lab‟ to „truly
        understanding what‟s going on‟
   1970s
       Invention of gene sequencing fundamentally changed biology from
        a qualitative field to a quantitative field
   1980s
       Deployment of the Hubble (and other) Space telescopes has had
        fundamental impact on astronomy and astrophysics
   2000s
       Massive adoption is fundamentally changing social science
        research
       Massively Multiplayer Online Games (MMOGs) and Virtual Worlds
        (VWs) are acting as „macroscopes of human behavior‟
The Virtual World
                          Observatory (VWO) Project
•   Four PIs, 30+ Post-docs, PhD and MS students, UGs, high-schoolers
     • Noshir Contractor, Northwestern: Networks
     • M. Scott Poole, Illinois Urbana-Champaign/NCSA: Groups
     • Jaideep Srivastava, Minnesota: Computer Science
     • Dmitri Williams, USC: Social Psychology
•   Collaborators
     • Castronova (Sociology, Indiana), Yee (Xerox PARC), Consalvo, Caplan
        (Economics, Delaware), Burt (Sociology, U of Chicago), Adamic (Info Sci,
        Michigan), …
•   Data and technology partners
     • Sony (EverQuest 2), Linden Labs (2nd Life), Bungie (Halo3), Kingsoft
        (Chevalier‟s Romance), others …
     • Cloudera Systems (Hadoop), Microsoft (SQL Server), Weka, …
Overall Goals of the VWO Project

                                                   Basic Science
                                                   • behavior, socialization, …
                                                   • novel, scalable algorithms
                                                   • NSF
Sticky Social
Media              tons of   Analytics             Government applications
                   data      • Social science
• Games                                            • team dynamics (Army)
                             models                • skills acq, leadership (Army)
• Virtual Worlds             • New algorithms      • social influence & adolescent
• Other social               • Ultra-scalability   health (CDC)
                                                   • etc.
apps
                                                    Business applications
                                                    • customer churn
                                                    • game design
                                                    • anti-social behavior
                                                    • etc.
Collaborators, Sponsors, Partners
   Team



   Financial Sponsors



   Data Partners



   Technology Partners
Part II – Impact on Science
Findings from a Player Survey
Who is playing?

   It is not just a
    bunch of kids
   Average age is
    31.16 (US
    population median
    is 35)
   More players in
    their 30s than in
    their 20s.
How much do they play?

   Mean is 25.86
    hours/week
   Compares to
    US mean of
    31.5 for TV
    (Hu et al,
    2001)




    • From prior experimental work, MMO play eats into entertainment TV
      and going out, not news
    • So much for kids being the ones with the free time.
Gender Differences
   More men players (78/22%)
   Men played to compete , and women played to socialize
   Men play more other games, but it was the women who were more
    satisfied EQ2 players
      Women: 29.32 hours/week

      Men: 25.03 hours/week

      Likelihood of quitting: “no plans to quit”:
        women 48.66%, men 35.08%
   Self reported play times
      Women: 26.03 (3 hours less than actual)
      Men: 24.10 (1 hour less than actual)
      Boys and girls are socialized early on, and thus have clear role
        expectations for their behaviors and identities (Gender Role Theory
        in action!!)
Playing with a partner
Inferring RW gender from VW data
Goal
    What virtual world behaviors and characteristics predict
    real world gender?
Data:
 Survey Data n=7119
 Survey Character Store
 EQ2 Character Store
Variables
 Avatar Characteristics:
       Gender, Race, Class, Experience, Guild Rank, Alignment,
        Archetypes
   Game play Behaviors:
       Total Deaths & Quests, PvP Kills & Deaths, Achievement Points,
        Number of Characters, Time played, Communication patterns


                                                                         25
Gender prediction results
   Close to 95% prediction accuracy
       Decision trees work rather well
   Character Gender, Race and Class are significant
    predictors to real life gender.
   Gender swapping is rarer, but systematically different by
    real gender
   Players tend to choose:
     character gender based on their real life gender
     character races that are gendered: women play
      elves/men play barbarians
     classes that are gendered: women play priests/men
      play fighters



                                                            26
Gender swapping behavior
Game Character                                 Real Gender
Gender
                            Male                 Female                   Total

Male                    4065     82.6%           98      8.2%           4163      68.0%

Female                   855     17.4%        1104      91.8%           1959      32.0%

Total                   4920 100.0%           1202 100.0%               6122   100.0%


    Observation
    • Far more males gender swap than females
    • Why?
         •   Men are more creative?
         •   Women have less identity confusion?
         •   Women get their „fill of gender swapping in real life‟ 
                                                                                    27
Economics: A test of RW  VW
mapping

    Do players behave in virtual worlds as we expect
     them to in the actual world?
    Economics is an obvious dimension to test
    In the real world, perfect aggregate data are hard to
     get
GDP and Price Level
   GDP and price levels are robust but comparatively unstable

                                          GDP and Prices on Antonia Bayle

                    5,000,000                                                                  160
                    4,500,000                                                                  140
                    4,000,000
                                                                                               120




                                                                                                     Prices (January = 100)
                    3,500,000
                                                                                               100
       GDP (Gold)




                    3,000,000
                    2,500,000                                                                  80
                    2,000,000                                                                  60
                    1,500,000
                                                                                               40
                    1,000,000
                                                                                               20
                     500,000
                           0                                                                   0
                                January       February    March         April            May

                                            Nominal GDP    Price Level (January = 100)
Money Supply and Price
                                                                                         Change in Money Supply and Population on Antonia Bayle
   The instability is                                                2500                                                                                       4000

    explicable through                                                                                                                                           3000




                              Change in Money (000 Gold)
                                                                      2000
    the Quantity Theory                                                                                                                                          2000




                                                                                                                                                                         Change in Accounts
                                                                                                                                                                 1000

    of Money                                                          1500
                                                                                                                                                                 0

                                                                                                                                                                 -1000

        a rapid influx of                                            1000
                                                                                                                                                                 -2000

         money . . .                                                            500                                                                              -3000

                                                                                                                                                                 -4000

                                                                                 0                                                                               -5000
                                                                                      February            March                  April              May

                                                                                         Change in Money Supply (000 Gold)         Change in Active Accounts




        . . . dramatically                                                                            Price Level on Antonia Bayle

         boosted prices                                                          50
                                                                                 40
                                                Percent Change in Price Level




                                                                                 30
   More evidence that                                                           20
    this behaves like a                                                          10

    real economy                                                                  0
                                                                                -10   February         March             April               May               June

                                                                                -20
                                                                                                                         Price Level
Networks in Virtual Worlds




                                      SONIC

                             Advancing the Science of
                             Networks in Communities
Why do we create and sustain
 networks?
          Theories of self-interest                  Theories of contagion
          Theories of social and                     Theories of balance
           resource exchange                          Theories of homophily
          Theories of mutual interest                Theories of proximity
           and collective action                      Theories of co-evolution

Sources:
Contractor, N. S., Wasserman, S. & Faust, K. (2006). Testing multi-theoretical multilevel hypotheses
  about organizational networks: An analytic framework and empirical example. Academy of
  Management Review.
Monge, P. R. & Contractor, N. S. (2003). Theories of Communication Networks. New York: Oxford
  University Press.




                                                                                                    SONIC

                                                                                           Advancing the Science of
                                                                                           Networks in Communities
“Structural signatures” of Social Theories
                                         A                             A




                                                     B
                                 B                                                        F

                                             +   F


                                                               +               -
                                 C
                                     -           E
                                                     C
                                                                                      E




                                                                       D
                                         D




     Self interest                   Exchange                   Balance

                                                                           A
            A


                                                         B                                        F
 B                           F
                                                                   -           +
                +                                        C                                    E


 C                                                                         D
                         E

                                                             Novice
            D                                                Expert
     Collective Action               Homophily                Contagion

                                                                                                      SONIC

                                                                                   Advancing the Science of
                                                                                   Networks in Communities
Black: male
                       Red: female




Partnership   Instant messaging




                                   SONIC
   Trade           Mail
                          Advancing the Science of
                          Networks in Communities
4      5              2
                                          0      0              0
                   (1)
                                          0      0              0
       (2)
                                          0      1              0
                                          1      3              0
                                          1      2              0
                                          0      1              0
                  (n)
                                          0      0              0

        Social Networks                  (1)    (2)             (n)
as network structure frequency                                Cluster Structure
vectors in a bag-of-words model                                 Vectors using
                                                               Text clustering
                                                                  methods
Social theory + IR          Cluster means provide        Attribute values for
based network                 modes of network           each cluster can be
analysis                   structure configurations       used to discover
                               Making up all the      trends between network
                                social networks       structures and attributes
Results – normalized network structure
vector means for all clusters
Results
   Clusters 1 and 4 are similar
         Groups kill fewer monsters
         Group members in cluster 4 do not communicate much
         Group members in cluster 1 generally limit their communication to just
          one other person in the group
         Most people belong to these two clusters
         Consistent with previous research - users in virtual environments are
          less likely to interact with strangers
         [N. Ducheneaut, N. Yee, E. Nickell and R. Moore, “Alone Together?” Exploring
          the social dynamics of massively multiplayer online games, Proceedings
          CHI06, ACM Press, New York, 407-416.]
   Cluster 5 groups have many 1-edge and 2-out stars
        Most of the communication is one way possibly indicating presence of
         central people
        Maximum number of monsters killed out of all clusters
        Performance of the groups is very good
        Minimal communication
        It is possible that cluster 5 consists of groups more focused on playing
         and performing well in the game and less on socializing
Some Open Questions
   How similar/dissimilar are online social networks from real-
    world social networks?
   Is online socialization
       Only a sustainability activity of real-world networks?
       Causing new social networks to be formed?
       Is fundamentally a different type of networking activity?
   A fundamental tenet of socialization has been
    “geography/proximity drives socialization”
       How is this being impacted by online socialization?
TeamSkill: Modeling Team
Chemistry in Online Multi-
            Player Games
Description and motivation
   The goal of this work is to improve skill assessment approaches
    used in multiplayer games, especially team-based games
   Xbox Live, PSN, Steam, and Battle.net have between 149-167
    million total users, collectively (and that number is growing)
       Many of the most popular games are team-based: Halo, Call of Duty,
        TF2, CS, etc
   Why?
       Better skill assessment = fairer games = less player attrition
       More accurate rankings of players/teams
       Most previous work has focused on individuals, not teams
   It‟s a hard problem
       Online setting: Updates to a player‟s skill distribution must be done
        after each game is played
       Applicability: Any generalized assessment technique cannot include
        game-specific data
   Our datasets are from the games of professional Halo 3 players



                                         40
What is Halo?




    Halo is one of the most well-known “first-person shooter” video
     game series in the world
    Online/LAN-based multiplayer is its most significant component
        650,000 to 850,000 unique users per day at its peak
                                                                       41
Major League Gaming
                                                  Our work focuses on
                                                   professional Halo 3 players
                                                    Online scrimmage data, as
                                                      well as complete tournament
                                                      data, is readily available from
                                                      bungie.net and mlgpro.com
                                                    Players are highly-skilled
                                                      individually – allows us to
                                                      better focus on group-level
 What is Major League Gaming (MLG)?                   performance characteristics
 • A professional league for competitive            Players change teams
   gaming, including Halo 3 from ‟08-‟10
 • Best players in the world compete in this          regularly, helps isolate
   league                                             impact of particular players
 • Last tournament watched by over                    on overall team performance
   1,000,000 people online
                                                    Unexplored boundary case
 • Our datasets are comprised of games
   played by these players                            for skill assessment

                                                                                   42
Skill assessment
    An old problem (with different applications in different
     contexts)
    Paired comparison estimation
        Foundational work by Thurstone (1927) and Bradley-Terry (1952)
        Elo (1959), popularized in chess ranking (FIDE, USCF)
        Glicko (1993) – player-level ratings volatility incorporated (σ2),
         addition of rating periods
    TrueSkill (2006) - factor-graph based approach used in
     Microsoft‟s Xbox Live gaming service
        Used to match players/teams up with each other online
    Our work focuses on Elo, Glicko, and TrueSkill




                                       43
Elo (1959)
     Arpad Elo, Dept of Physics, Marquette University
         Was a master chess player in USCF
     Proposed the Elo Rating System
         Replaced earlier systems of competitive rewards (i.e.,
          tournaments) in USCF/FIDE
         Simple to implement: assumes player skill is normally-
          distributed with constant variance β2
         Still widely-used today




                                  44
Glicko (1993)
    Mark E. Glickman, Department of Health Policy
     and Management , Boston University
    Proposed the Glicko Rating System
        Addresses rating reliability issue through the use of
         rating periods and player-specific variances
        Elo is a special case of Glicko
    Iterative approach for approximating the
     marginal posterior distribution of a player‟s skill
     conditional on other players‟ priors
        More computationally tractable for large datasets



                                  45
TrueSkill (2006)
    Ralf Herbrich and Thore
     Graepel at Microsoft
     Research, Cambridge, UK
    Used for automated
     ranking/matchmaking on Xbox
     Live
    Uses factor graphs to model
     multi-player, multi-team
     environments
    Converges quickly (~5 games
     for a player)
    Large-scale deployment
        30 million Xbox Live members
        150+ games use TrueSkill for
         ranking & matchmaking




                                        46
Issues with current approaches
                         1      2         3    4
                                    +
                                  1234

     Basic idea
       Given: skill ratings of each team member, i.e., si ~ N(μi, σi2)
       Sum across all team members
     Not intuitive: Team chemistry is a well-known concept in team-based
      competition [Martens 1987, Yukelson 1997] and it is not captured in
      any of these models
       Can think of it as the overall dynamics of a team resulting from
         leadership, confidence, relationships, and mutual trust
       Independence assumption not realistic in teams, especially at
         high levels of play


                                     47
More information is available, however…
                                    1    2     3     4

                         12     13       14    23    24        34

                              123       124    134       234

                                          1234

     Observation: We have more information than just the history of
      individual players – we also know the histories of groups of players
         Player 1‟s history ∩ player 2‟s history  history of {1, 2}
         Existing approaches only make use of top row
     Idea: Estimate the skills of subgroups of players on a team, combine
      in some way, and use to produce better estimate of a team‟s skill




                                              48
Reframing the assessment
problem
                     k=1            1    2     3     4

                     k=2 12      13      14    23    24        34

                     k=3      123       124    134       234

                     k=4                  1234

    Alterations to existing approaches necessary (i.e., Elo/Glicko/TrueSkill)
      Player-level skill representation  generalized subgroup skill
         representation
          For each game and for each k <= size of the team (K), treat each
            group as you would an individual player and update skill accordingly
      Hashing of skill variable matrix according to unordered subgroup
         membership
    Treat Elo/Glicko/TrueSkill as „base learners‟, a la boosting
      Each rating says something about the skill of that particular subgroup
      …but how do we aggregate these ratings to estimate the skill of a
         team?
                                          49
TeamSkill
    Four different aggregation approaches:
        TeamSkill-K
        TeamSkill-AllK
        TeamSkill-AllK-EV
        TeamSkill-AllK-LS




                             50
Aggregation issues

                     black = history available       red = no history available




 time:       t=1                              t = 100                             t = 200
          After 1 game                 New player – 5 - who‟s            New player – 6 – who‟s
                                     never played with 1, 2, or 3      played with 3 or 5 (not both)
        Real world case: assume that players can leave/join the 4-player
         team and look at the timeline
        The question at each point in time, t: how best to combine the
         available group ratings to produce a team rating?
        The problem: the feature space is expanding and contracting over
         time.
                                                                                                 51
Data set overview
    Collected over the course of
     2009
    7,590 games (2,076 from
     tournaments and 5,514 from
     Xbox Live scrimmages)
    448 players on 140 different
     professional and semi-
     professional teams
    Games took place during
     January 2008 through January
     2010
    Websites pertaining to this data
        http://stats.halofit.org - Player/team
         statistics
        http://halofit.org – Datasets and related
         information
                                                     “Friend or foe” social network for all tournaments in 2008 and
                                                     2009




                                                                                                              52
Evaluation
    The TeamSkill approaches were evaluated by predicting the outcomes
     of games occurring prior to 10 MLG tournaments and comparing their
     accuracy to unaltered versions (k = 1) of their base learner rating
     systems - Elo, Glicko, and TrueSkill (for TeamSkill-K, all possible
     choices of k for teams of 4, 1 ≤ k ≤ 4, were used)
    For each tournament, we evaluated each rating approach using:
        3 types of training data sets - games consisting only of previous tournament data,
         games from online scrimmages only, and games of both types.
        3 periods of game history - all data except for the data between the test tournament
         and the one preceding it (“long”), all data between the test tournament and the one
         preceding it (“recent”), and all data before the test tournament (“complete”).
            We will only show “complete” as results for “long” and “recent” mirrored those for “complete”
        2 types of games - the full dataset and those games considered “close” (i.e., prior
         probability of one team winning close to 50%).




                                                     53
Results – all games



       Prediction accuracy for both tournament and scrimmage/custom games using complete history




       Prediction accuracy for tournament games using complete history




       Prediction accuracy for scrimmage/custom games using complete history
                                                          54
Results – “close” games



       Prediction accuracy for both tournament and scrimmage/custom games using complete history




       Prediction accuracy for tournament games using complete history




       Prediction accuracy for scrimmage/custom games using complete history
                                                          55
Conclusions
    Modified several existing assessment approaches to
     operate on generalized player subgroup entities (instead of
     just individual players)
    Introduced four aggregation methods for combining player
     subgroup information to produce final forecast
    Shown evidence that close games are decided on the
     basis of team chemistry, consistent with sports psychology
     research




                                56
Part III: Impact on Business
Player Churn Prediction
Time Equals Stickiness

                               6.00



                               5.00
                                                      6
Ratio of Quitters to Stayers




                               4.00



                               3.00



                               2.00



                               1.00



                               0.00
                                      1   3   5   7   9   11 13 15 17 19 21 23 25 27 29 31 33 35 37 39 41 43 45 47 49 51 53 55 57 59 61 63 65 67 69
                                                                                       Character Level


                                     There are less quitters as the levels go up, and focus
                                      should be on the first 20 levels.
Solo vs. Social Players

                               6.00



                               5.00
Ratio of Quitters to Stayers




                               4.00



                               3.00                                                                                                         Solo
                                                                                                                                            Social

                               2.00



                               1.00



                               0.00
                                      1 3 5 7 9 11 13 15 17 19 21 23 25 27 29 31 33 35 37 39 41 43 45 47 49 51 53 55 57 59 61 63 65 67 69
                                                                               Character Level

                              Isolated players are 3.5x more likely to quit (B = 1.26, p<.001).
                               Focus design on facilitating social interaction.
Problem Statement & Approach
   Objective: At time t, for player p, given a window w, compute
    the probability, P(p,t,w), of p churning within the interval
    (t,t+w), and the confidence in this probability.
   Approach: Estimate P(p,t,w) and its confidence using
        Data about p‟s activity and socialization behavior
        Statistics and machine learning (linear, non-linear, ensemble, etc.)
        Use of socio-psychological theories of player motivation
        Novel synthesis of data driven (DD) and theory driven (TD) approaches




    61
Player Motivation Theories



                             Richard Bartle, 1996




                              Nick Yee, 2005




62
Mapping MMO Behaviors to Theory

    Example MMORPG
        Persistent virtual world
        Players have avatars
        Quest-driven
        Participation in groups,
           guilds
    Dataset (game logs)
        Time period: FEB-JUN, 2006
        No. of accounts: ~16000
        Churner definition: 2 months of inactivity




63
Synthesis of TD and DD approach




64
Model Evaluation (Confusion Matrix)

     Model performance for different features (10
      fold cross-validation)
                 Features      Tree size   Precision (%)   Recall (%)   F-measure
                                (nodes)                                    (%)
          Pure DD                485           69.3           84           76
          TD:                     59           67.1          76.2         71.3
          Achievement+Social
          TD: Achievement         37           67.2          73.2         70.1
          TD: Socialization       7            64.3          55.4         59.5

     Observations
           Decision tree learning works quite well
           DD model very accurate (76%); difficult to interpret (485
            nodes)
           TD model interpretable (7 – 59 nodes), but less accurate
           Achievement-orientation more important than socialization
 65
Model Evaluation (Lift Chart)




    Observations
        TD model does better in
         predicting top quintile of
         churners
        DD model performs better in the
         40%-70% range

66
Ensemble Approach – Model
Evaluation
    Ensemble approach
        A „committee of classifiers‟ votes on the result
        Heterogeneity helps
              No. of       Precision       Recall      F-measure
             clusters
                5            66.23         91.94            76.99
                10           66.49         91.08            76.87
                15           67.58         89.80            77.12
                20           67.6          90.13            77.25


    Improvements over single model
        Best recall value shows 7.94% improvement , i.e. model
         can identify larger proportion of potential churners)
        Best F-measure shows 1.25% improvement

67
Conclusions

    Comparison of theory-driven and data-driven model in
     terms of prediction accuracy and model interpretability
    Achievement-orientation is more important than
     socialization-orientation in identifying potential churners
    Ensemble model can identify larger proportion of
     potential churners as compared to single global model




68
Course Outline

• Module 1
   •        Introduction to Social Analytics – applying data mining to social computing
            systems; examples of a number of social computing systems, e.g.
            FaceBook, MMO games, etc.
• Module 2
   •        Computational online trust
   •        Identifying key influencers
• Module 3
   •        Analysis of clandestine networks
   •        Katana - game analytics engine




 4/9/2013                                 University of Minnesota                     69
Computational Trust in
   Multiplayer Online
               Games
“If you want to go fast walk alone, if you
want to go far then walk with a group.”
                           - Proverb from Ghana
Virtual Worlds & Massive Online
Games

   Massively Multiplayer Online Role
    Playing Games
    (MMORPGs/MMOs)
   Simulated Environments like
    SecondLife
   Millions of people can interact with
    one another is shared virtual
    environment
   People can engage in a large number
    of activities with one another and with
    the environment
   Many of the observed behaviors have
    offline analogs
                                              72
                                              72
Big Picture Questions
   How is trust expressed differently in different social
    contexts?
       Cooperative (PvE), Adversarial (PvP), …
   How is trust expressed in different types of social
    networks?
       Housing, Mentoring, Trade, Group, …
   What are the characteristics of trust and related networks
    in MMOs?
       Similarities and differences with social networks in other domains
        e.g., citation networks, co-authorship networks
   What role can features derived from the trust network play
    in prediction tasks e.g., link prediction (formation,
    breakage, change), trust propensity, success prediction

                                                                             73
Computational Trust in Multiplayer Online Games
                                            Department of Computer Science, University of Minnesota
                                                        Muhammad Aurangzeb Ahmad
                                                 Trust as a Multi-Level Network Phenomenon




Trust as a multi-modal multi-level Network formulations of Traditional   Incorporating social science theories   Generative Network Models for Trust based
      network phenomenon                Trust related concepts              in trust related prediction task                 social interactions
Computational Trust in Multiplayer Online Games
                                              Department of Computer Science, University of Minnesota
                                                          Muhammad Aurangzeb Ahmad
                                                   Trust as a Multi-Level Network Phenomenon




  Trust as a multi-modal multi-level Network formulations of Traditional    Incorporating social science theories   Generative Network Models for Trust based
        network phenomenon                Trust related concepts               in trust related prediction task                 social interactions

                                                               Social Characteristics of Trust




Effect of Social Environments on Trust      Trust and Homophily          Trust and Clandestine Behavior        Trust and Mentoring       Trust and Trade




                                        Most types of homophily do not
Different Social environments result in carry over to the MMO domain        No Honor Amongst Thieves
   differences in network signatures

                                                                                                              Trust based and other social networks in MMOs
                                                                                                                 exhibit anomalous network characteristics
Computational Trust in Multiplayer Online Games
                                              Department of Computer Science, University of Minnesota
                                                          Muhammad Aurangzeb Ahmad
                                                   Trust as a Multi-Level Network Phenomenon




  Trust as a multi-modal multi-level Network formulations of Traditional     Incorporating social science theories   Generative Network Models for Trust based
        network phenomenon                Trust related concepts                in trust related prediction task                 social interactions

                                                               Social Characteristics of Trust




Effect of Social Environments on Trust      Trust and Homophily          Trust and Clandestine Behavior         Trust and Mentoring        Trust and Trade




                                        Most types of homophily do not
Different Social environments result in carry over to the MMO domain         No Honor Amongst Thieves
   differences in network signatures

                                                                                                               Trust based and other social networks in MMOs
                                                                                                                  exhibit anomalous network characteristics


                                                                  Trust and Prediction
                 Trust Prediction Family of Problems                               Item Recommendation                      Success Prediction (Social Capital)




 Predict Formation, Change, Breakage of Trust with in the same
 network and across social networks. Predict trust propensity.
                                                                           Effects of social environments on trust           Effect of network structure on trust
Computational Trust in Multiplayer Online Games
                                              Department of Computer Science, University of Minnesota
                                                          Muhammad Aurangzeb Ahmad
                                                   Trust as a Multi-Level Network Phenomenon




  Trust as a multi-modal multi-level Network formulations of Traditional     Incorporating social science theories   Generative Network Models for Trust based
        network phenomenon                Trust related concepts                in trust related prediction task                 social interactions

                                                               Social Characteristics of Trust




Effect of Social Environments on Trust      Trust and Homophily          Trust and Clandestine Behavior         Trust and Mentoring        Trust and Trade




                                        Most types of homophily do not
Different Social environments result in carry over to the MMO domain         No Honor Amongst Thieves
   differences in network signatures

                                                                                                               Trust based and other social networks in MMOs
                                                                                                                  exhibit anomalous network characteristics


                                                                  Trust and Prediction
                 Trust Prediction Family of Problems                               Item Recommendation                      Success Prediction (Social Capital)




 Predict Formation, Change, Breakage of Trust with in the same
 network and across Social networks. Predict trust propensity.
                                                                           Effects of social environments on trust           Effect of network structure on trust
Computational Trust in Multiplayer Online Games
                                          Department of Computer Science, University of Minnesota
                                                      Muhammad Aurangzeb Ahmad


           Computational Social Science
        Semantics                                        Structure




     Trust and Homophily                     Characteristics of Trust Networks
Most types of homophily do not          Trust based and other social networks in MMOs
carry over to the MMO domain               exhibit anomalous network characteristics

                                                                                                        Algorithmic
                                                                                                  Trust Prediction Family of Problems




                                                                                   Predict Formation, Change, Breakage of Trust with in the same
                                                                                    network and across social networks. Predict trust propensity.



                           Applications
                          Detection of Clandestine Actors




                                        No Honor Amongst Thieves
                Gold Farmer Detection
Computational Trust in Multiplayer Online Games
                                          Department of Computer Science, University of Minnesota
                                                      Muhammad Aurangzeb Ahmad


           Computational Social Science
        Semantics                                        Structure




     Trust and Homophily                     Characteristics of Trust Networks
Most types of homophily do not          Trust based and other social networks in MMOs
carry over to the MMO domain               exhibit anomalous network characteristics

                                                                                                        Algorithmic
                                                                                                  Trust Prediction Family of Problems




                                                                                   Predict Formation, Change, Breakage of Trust with in the same
                                                                                    network and across social networks. Predict trust propensity.



                           Applications
                          Detection of Clandestine Actors




                                        No Honor Amongst Thieves
                Gold Farmer Detection
Housing-Trust in EQ2

 • Access permissions to in-game house
   as trust relationships
    • None: Cannot enter house.
    • Visitor: Can enter the house and can
      interact with objects in the house.
    • Friend: Visitor + move items
    • Trustee: Friend + remove items


 • Houses can contain also items which
   allow sales to other characters without
   exchanging on the market




                                             80
Homophily and Trust
   Homophily: Birds of a feather flock together
   There is no one form of homophily and homophily in
    general is described in multiple ways: Status vs. Value
    Homophily
   Each of these homophiles are in turn defined in multiple
    ways themselves
   Previous literature instantiates homophily in MMOs in
    terms of player characteristics and behavior in the game




                                                          81
Network Models, Homophily and
Trust Networks in MMOs


   RQ1: Does homophily in MMOs operate in ways similar to
    homophily in the offline world?
   RQ2: How do we map characteristics that define
    homophily in the offline world to online settings?




                                                        82
Mapping Homophily in MMOs




   In general, studies of homophily in MMOs assume only one type of homophily and
    generalize based on that type
   Even in the offline world homophily is of different types
   Hence the necessity of Mapping Homophily which we address here
   Mapping and Proteus Effect
                                                                                     83
Trust and Homophily in MMOs

     Homophily Type              Hypothesis               Observation
H   Gender Homophily   Players trust other players who           ?
1                      are of the same gender
H   Age Homophily      Players trust other players who           ?
2                      are of the same age cohorts
H   Class Homophily    Players trust other players who           ?
3                      are of the same class
H   Race Homophily     Players trust other players who           ?
4                      are of the same race
H   Guild Homophily    Players trust other players who           ?
5                      belong to the same guild
H   Level Homophily    Players trust other players who           ?
6                      are at a similar level
H   Challenge          Players trust other players who           ?
7   Homophily          like similar types of challenges

                                                                        84
Key Observations

H1: Players trust other players who are of the same gender?


                           In general players trust other players who are of
                           the same gender


H2: Players trust other players who are of similar age?

                           The stronger the type of trust the lesser is the age
                           difference between the people specifying trust



H3: Players trust other players who are of the same class?

                           Class does not seem to effect the choice of trusting
                           others

                                                                                  85
Key Observations

H4: Players trust other players who are of the same race?


                           Race does not seem to effect the choice of trusting
                           others


H5: Players trust other players who are of the same guild?

                           In general, the stronger the type of trust, the greater
                           is the percentage of the people who trust people in
                           their own guilds


H6: Players trust other players more who are level at a similar rate?

                           Leveling at the same rate does not seem to greatly
                           effect trust amongst players

                                                                                     86
Key Observations

H7: Players trust other players who are of the same level?




 • Level difference seems to have some effect on trust
 • For Trustee (strongest) and the Visitor (weakest) form of trust, the lower level
    players are more likely to trust players who are at a higher level


 Summary:
 • Homophily is observed for a subset of types in MMOs as compared to what it is
   observed for in the offline world
 • The types of homophily which are not observed in MMOs are the ones which are
   greatly effected by game mechanics

                                                                                 87
Computational Trust in Multiplayer Online Games
                                          Department of Computer Science, University of Minnesota
                                                      Muhammad Aurangzeb Ahmad


           Computational Social Science
        Semantics                                        Structure




     Trust and Homophily                     Characteristics of Trust Networks
Most types of homophily do not          Trust based and other social networks in MMOs
carry over to the MMO domain               exhibit anomalous network characteristics

                                                                                                        Algorithmic
                                                                                                  Trust Prediction Family of Problems




                                                                                   Predict Formation, Change, Breakage of Trust with in the same
                                                                                    network and across social networks. Predict trust propensity.



                           Applications
                          Detection of Clandestine Actors




                                        No Honor Amongst Thieves
                Gold Farmer Detection
Social Networks: General
Observations
   There is an extensive literature on characteristics of social
    networks (Leskovec PAKDD 2005, Leskovec ICDM 2005,
    McGlohon ICDM 2008, McGlohon KDD 2008)
   The network exhibits monotonically
    shrinking diameter over time (Leskovec PAKDD 2005,
    Leskovec ICDM 2005, McGlohon ICDM 2008)
   At a certain point in time called the Gelling Point many
    smaller connected connect together and become part of
    the largest connected component (Leskovec PAKDD 2005,
    Leskovec ICDM 2005, McGlohon ICDM 2008)
   The largest connected component (LCC) comprises of
    the majority of the nodes in the network (>= 80%)
    (McGlohon ICDM 2008, McGlohon KDD 2008)

                                                                89
Social Networks: General
Observations
   The size of the second and the third largest connected
    components remain constant (more or less) even though
    the identity of these components change over time
    (Leskovec PAKDD 2005, Leskovec ICDM 2005, McGlohon
    ICDM 2008, McGlohon KDD 2008)
   Network isolates are few in number (<5%)
    (Leskovec PAKDD 2005, Leskovec ICDM 2005)
   The number of connected components decreases over
    time
    (Leskovec PAKDD 2005, Leskovec ICDM 2005, McGlohon
    ICDM 2008)
   Relatively fast growth of LCC close to the gelling point
    (Leskovec PAKDD 2005, Leskovec ICDM 2005)

                                                          90
Trust Networks in MMOs

   Data from 4 servers is available. Results from one server
    (Player vs. Environment, „guk‟) are shown
   The network consists of 15,237 nodes, 30,686 edges
    and 1,476 connected components
   Dataset spans from January 2006 to August 2006
   Average node degree of 4.03. The size of the three
    largest connected components are as follows: 9039, 51
    and 49. The largest connected component accounts for
    59% of all the nodes in the network


                                      The Trust Network on „guk‟ on August 31, 2006




                                                                               91
Key Observations

   Observation 1: Preferential Attachment: The rich get
    richer but not too rich
    Explanation: Social bandwidth is limited, Dunbar
    Number
   Observation 2: The growth of the LCC is retarded after
    the gelling point
    Explanation: The trust network has a relatively low
    growth rate as compared to the other networks
   Observation 3: Non-monotonic change in the diameter
    of the largest connected component
    Explanation: Players have different levels of activity at
    various points in time and can also “drop out” of the
    network if they churn from the game

                                                                92
Key Observations
   Observation 4: A large number of isolate components are
    observed (> 1000)
    Explanation: People join in groups and spend all the time
    playing with one another instead of interacting with people
    from the outside
   Observation 5: The number of isolate components
    increases monotonically over time
    Explanation: (Same as observation 4)
   Observation 6: Nodes in the non-LCC constitute a
    significant portion of the network (41%, 8 months after
    gelling point)
    Explanation: (Same as observation 4)


                                                             93
Generative Models of Trust Networks
   Time bound Preferential Attachment: The rich get rich
    but not so much after a certain point in time. Edge
    formation is bound by time
   Presence of Auxiliary Components: Isolate components
    are added to the network at an almost constant rate over
    time




                                                          94
Generative Models of Trust Networks
(ii)

   Non-Monotonic Decrease in the Diameter:
    Nodes become inert after a certain point in time. Sample
    the lifetime of nodes from a normal distribution
   Homophily in Edge formation: Probability of edge
    formation dependent upon node degree as well as
    agreement (similarity) in node characteristics




                                                               95
Results

     Diameter: Non-Monotonically Changing Diameter




              % LCC as being relatively small




                                                     96
Results

          Number of Connected Components




          Network growth and the gelling point




                                                 97
Conclusion: Trust Networks
   Trust Networks in MMOs exhibit many properties which are
    not exhibited by other social networks in most other
    domains
   Proposed a model based on observations and domain
    knowledge
   Models of social networks should incorporate the
    peculiarities which are observed in MMOs in general
   Generalization? Similar observations have been made for
    mentoring networks but not for PvP, Trade and Chat
    Networks




                                                          98
Computational Trust in Multiplayer Online Games
                                          Department of Computer Science, University of Minnesota
                                                      Muhammad Aurangzeb Ahmad


           Computational Social Science
        Semantics                                        Structure




     Trust and Homophily                     Characteristics of Trust Networks
Most types of homophily do not          Trust based and other social networks in MMOs
carry over to the MMO domain               exhibit anomalous network characteristics

                                                                                                        Algorithmic
                                                                                                  Trust Prediction Family of Problems




                                                                                   Predict Formation, Change, Breakage of Trust with in the same
                                                                                    network and across social networks. Predict trust propensity.



                           Applications
                          Detection of Clandestine Actors




                                        No Honor Amongst Thieves
                Gold Farmer Detection
Trust Prediction Family of
Problems
Trust Prediction: Given a trust network G predict which
  nodes are going to trust one another in the future




Prediction Across Networks: Given a set of actors who
participate in multiple types of interactions using features from
one network predict the existence of links in the other network




                                                                    100
Trust Prediction Family of
Problems
   Machine Learning/Classification approach to the problem
    of trust prediction
   Introduced a set of new problems (inter-network link
    prediction, trust propensity prediction)
   Proposed an algorithm for link prediction which used
    domain knowledge from social science theories
   New Contributions:
       Does the social context (adversarial vs. cooperative) effect
        prediction results?
       If so then what is the implication for generalization?




                                                                       101
Prediction Task

   Trust Prediction as a classification problem
   60,000 examples for each prediction task
   10 Fold Cross-validation
   Data from Guk (Cooperative) and Nagafen (Adversarial)
    Servers
   Six Standard Classifiers for Comparison: J48, JRip,
    AdaBoost, Bayes Network, Naive Bayes and k-nearest
    neighbor
               Positive Example:             Negative Example:




        Training Period    Test Period   Training Period   Test Period




                                                                         102
Prediction: Group Network




   In general good prediction results are obtained for the
    combat network (in either direction), except one case
   The grouping network is a union of all grouping
    instances and thus it is extremely dense (1,796,438
    edges, 31,900 nodes)
                                                              103
Prediction: Group Network




   Grouping is not a good predictor of mentoring but the vice
    versa is correct
   Mentoring is (more often than not is accompanied by
    grouping) but grouping can be in a variety of contexts e.g.,
    raids, quests, dungeon instances, …
                                                                   104
Prediction: Combat Network




   In general good prediction results are obtained for the combat
    network (in either direction)
   In general if players have played against another person then
    they have friends in other networks who have done the same
    or something similar
                                                                 105
Comparison across Social
Environments




   T: Trust; M: Mentoring; B: Trade
   In general the results are similar for both the
    environments with some exceptions
   Mentoring-Mentoring: In the adversarial environment a
    more cliqueish behavior is observed i.e., friends of
    friends are likely to mentor one another in the future
    which is not the case for the cooperative environment

                                                         106
Comparison across Social
Environments




   Mentoring-Related Tasks: In general mentoring is
    much more prevalent in the adversarial environment as
    compared to the cooperative environment (~3 times) and
    is much more intense. Overlap between mentoring and
    trade is thus more likely




                                                        107
Computational Trust in Multiplayer Online Games
                                          Department of Computer Science, University of Minnesota
                                                      Muhammad Aurangzeb Ahmad


           Computational Social Science
        Semantics                                        Structure




     Trust and Homophily                     Characteristics of Trust Networks
Most types of homophily do not          Trust based and other social networks in MMOs
carry over to the MMO domain               exhibit anomalous network characteristics



                                                                                                             Algorithmic
                                                                                                       Trust Prediction Family of Problems




                                                                                        Predict Formation, Change, Breakage of Trust with in the same
                                                                                         network and across social networks. Predict trust propensity.


                           Applications
                          Detection of Clandestine Actors




                                        No Honor Amongst Thieves
                Gold Farmer Detection
Trust Amongst Clandestine Actors

“A plague upon‟t when thieves
cannot be true one to another!”
– Sir Falstaff, Henry IV, Part 1, II.ii




                         Do gold farmers
                         trust each other?
                                             109
Hypergraphs to Represent Tripartite Graphs




   • Accounts can have several characters
   • Houses can be accessed by several characters
   • Projecting to one- or two-model data obscures crucial
     information about embededdness and paths
       • Figure 2a: Can ca31 access the same house as ca11?
       • Figure 2b: Are characters all owned by same account?
                                                                110
Hypergraphs: Key Concepts

                      •     Hyperedge: An edge between three or
                            more nodes in a graph. We use three
                            types of nodes: Character, account and
                            house
                      •     Node Degree: The number of
                            hyperedges which are connected to a
                            node
                                NDh1 = 3
                      •     Edge Degree: The number of
                            hyperedges that an edge participates in
                                EDa1-h1 = 2




                                                                      111
Network Characteristics




• Long tail distributions are observed for the various degree distributions
• The mapping from character-house to an account is always unique
• Players who are connected to a large number of houses are highly active
  players otherwise as well
                                                                         112
Characteristics of Hypergraph Projection Networks
 • Account Projection: Majority of the gold farmer nodes are isolates
   (79%). Affiliates well-connected (8.89) vs. non-affiliates (3.47)
 • Character Projection: Majority of the gold farmer nodes are isolates
   (84%). Affiliates well-connected (10.42) vs. non-affiliates (3.23)
 • House Projection: 521 gold farmer houses. Most are isolates (not
   shown) but others are part of complex structures. Densely connected
   network with gold farmers (7.56) and affiliates (84.02)




                         The Housing Projection Network
                                                                     113
Key Observations

          • Gold farmers grant trust ties less frequently than either
            affiliates or general players
          • Gold farmers grant and receive fewer housing permissions
            (1.82) than their affiliates (4.03) or general player population
            (2.73)




                        Total degree                     In degree                     Out degree

                 <n>      < nGF >      < nAff >   <n>     < nGF >    < nAff >   <n>     < nGF >     < nAff >

  Farmers        1.82       0.29        1.82      0.89     0.29       0.89      1.07      0.29       1.07

  Affiliates     4.03       1.28        0.70      1.55     0.75       0.70      2.88      0.63       0.70

Non-Affiliates   2.73        -          7.77      1.57       -        5.98      1.56       -         2.34


                                                                                                    114
Key Observations

          • No honor among thieves
                 • Gold farmers also have very low tendency to grant other
                   gold farmers permission (0.29)
                 • Affiliates also unlikely to trust other affiliates (0.70)




                          Total degree                     In degree                     Out degree

                   <n>      < nGF >      < nAff >   <n>     < nGF >    < nAff >   <n>     < nGF >     < nAff >

  Farmers          1.82       0.29        1.82      0.89     0.29       0.89      1.07      0.29       1.07

  Affiliates       4.03       1.28        0.70      1.55     0.75       0.70      2.88      0.63       0.70

Non-Affiliates     2.73        -          7.77      1.57       -        5.98      1.56       -         2.34


                                                                                                      115
Key Observations

      • Affiliates are brokers:
               • Farmers trust affiliates more (1.82) than other farmers (0.29)
               • Affiliates trust farmers more (1.28) than other affiliates (0.70)
               • Non-affiliates have a greater tendency to grant permissions to
                 affiliates (7.77) than in general (2.73)




                           Total degree                     In degree                     Out degree

                    <n>      < nGF >      < nAff >   <n>     < nGF >    < nAff >   <n>     < nGF >     < nAff >

  Farmers           1.82       0.29        1.82      0.89     0.29       0.89      1.07      0.29       1.07

  Affiliates        4.03       1.28        0.70      1.55     0.75       0.70      2.88      0.63       0.70

Non-Affiliates      2.73        -          7.77      1.57       -        5.98      1.56       -         2.34


                                                                                                       116
Frequent Itemset Mining for Frequent Hyper-subgraphs


    Support of a Hyper-subgraph: Given a sub-hypergraph of size k,
    subP is the pattern of interest containing the label P, shP is a pattern
    of the same size as subP and contains the label P, the support is
    defined as follows:




    Support of pattern         also containing a gold farmer (red) = 5/8



                                                                         117
Frequent Itemset Mining for Frequent Hyper-subgraphs


    Confidence of a Hyper-Subgraph: Given a sub-hypergraph of
    size k, subP is the pattern of interest containing the label P, subG is
    a pattern which is structurally equivalent but which does not contain
    the label P, the confidence is defined as follows:




    Confidence of pattern        and containing a gold farmer = 5/7



                                                                        118
Frequent Patterns of GFs
• Very low (s ≤ 0.1) support and confidence for almost all
  (except 8) frequent patterns with gold farmers
• Remaining 8 patterns can be used for discrimination
  between gold farmers and non-gold farmers in a subset of
  the instances
• Gold farmers & affiliates are more connected: A 3rd of more
  complex patterns are associated with affiliates (15/44)




                                                             119
Application: Gold Farmer Detection (Behavioral)
  Classification using in-game behavioral and demographic features
  Each model corresponds to a grouping of different features




                                                                     120
Application: Gold Farmer Detection (Label
                   Propagation)
•   Problem: Not all people who are labeled as normal players are such. Some
    of them are gold farmers but have not been identified as such
•   Analogue: If someone is socializing almost exclusively with criminals then
    he may be a criminal
                                  Classifier       Metric Initial Dataset   Label Prop   Change In Performance
                                  Bayes Net       Precision     0.17          0.189              0.019
                                                    Recall     0.834          0.819              -0.015
                                                   F-Score     0.282          0.307              0.025
                                       J48        Precision    0.494           0.62              0.126
                                                    Recall     0.189          0.337              0.148
                                                   F-Score     0.273          0.437              0.164
                                      J Rip       Precision    0.495          0.537              0.042
                                                    Recall     0.462          0.462                 0
                                                   F-Score     0.478          0.497              0.019
                                      KNN         Precision    0.436           0.46              0.024
                                                    Recall     0.396          0.428              0.032
                                                   F-Score     0.415          0.443              0.028
                              Logistic Regression Precision    0.455          0.534              0.079
                                                    Recall     0.189          0.271              0.082
                                                   F-Score     0.267           0.36              0.093
                                 Naïve Bayes      Precision    0.146          0.142              -0.004
                                                    Recall     0.538          0.502              -0.036
                                                   F-Score      0.23          0.221              -0.009
                               Adaboost w/ DT Precision        0.405          0.471              0.066
                                                    Recall     0.105           0.08              -0.025
                                                   F-Score     0.167          0.137               -0.03



                                                                                                        121
Gold Farmer Detection
   Model 1 (Player Attribute Based Features): These features are
    based on the attributes of the player‟s character in the game e.g.,
    character race, character gender, distribution of gaming activities etc.
   Model 2 (Item Based Features): These are the features which are
    derived from items bought and sold from the consignment network.
    These features are based on the frequency of the frequent items sold
    or bought by gold farmers.
   Model 3 (Player Attribute & Item Based Features): All the attributes
    from the previous two models.
   Model 4 (Item Network Based Features): Features which are derived
    from the item network in a manner analogous to Model 2.
   Model 5 (Player Attribute & Item-Network Based Features): A
    combination of features from Model 1 and Model 4.
   Model 6 (Item Network & Item-Network Based Features): A
    combination of features from Model 2 and Model 4.
   Model 7 (Player Attribute, Item & Item-Network Based Features):
    Union of all the features described above.

                                                                         122
Application: Structural Signatures Approach Applied to Trade Networks

Not sufficient data is present to use the structural signature approach to catch
gold farmers
Alternative: Application of the same approach in other networks: Trade Network




A set of standard machine learning models are used: Naive Bayes, Bayes Net,
Logistic Regression, KNN, J48, JRip, AdaBoost and SMO



                                                                               123
Conclusion: Gold Farmer Behavior Analysis and Prediction


          Representation Related Issues
        addressed with respect to trust in
                                    MMOs
            Application of frequent pattern mining to
            discover distinct trust patterns associated
            with gold farmers

         No honor between thieves:
   Gold farmers tend not to trust other
                         gold farmers




                                                           124
Computational Trust in Multiplayer Online Games
                                              Department of Computer Science, University of Minnesota
                                                          Muhammad Aurangzeb Ahmad
                                                   Trust as a Multi-Level Network Phenomenon




  Trust as a multi-modal multi-level Network formulations of Traditional     Incorporating social science theories   Generative Network Models for Trust based
        network phenomenon                Trust related concepts                in trust related prediction task                 social interactions

                                                               Social Characteristics of Trust




Effect of Social Environments on Trust      Trust and Homophily          Trust and Clandestine Behavior         Trust and Mentoring        Trust and Trade




                                        Most types of homophily do not
Different Social environments result in carry over to the MMO domain         No Honor Amongst Thieves
   differences in network signatures

                                                                                                               Trust based and other social networks in MMOs
                                                                                                                  exhibit anomalous network characteristics


                                                                  Trust and Prediction
                 Trust Prediction Family of Problems                               Item Recommendation                      Success Prediction (Social Capital)




 Predict Formation, Change, Breakage of Trust with in the same
 network and across social networks. Predict trust propensity.
                                                                           Effects of social environments on trust           Effect of network structure on trust
Computational Trust in Multiplayer Online Games
                                          Department of Computer Science, University of Minnesota
                                                      Muhammad Aurangzeb Ahmad


           Computational Social Science
        Semantics                                        Structure




     Trust and Homophily                     Characteristics of Trust Networks
Most types of homophily do not          Trust based and other social networks in MMOs
carry over to the MMO domain               exhibit anomalous network characteristics

                                                                                                Predictive Analysis
                                                                                                  Trust Prediction Family of Problems




                                                                                   Predict Formation, Change, Breakage of Trust with in the same
                                                                                    network and across social networks. Predict trust propensity.



                           Applications
                          Detection of Clandestine Actors




                                        No Honor Amongst Thieves
                Gold Farmer Detection
Conclusion
   Availability of data allows one to analyze phenomenon
    where it was not possible to do so in the past (Trust in
    MMOs in the current case)
   Explored a series of big-picture questions pertaining to
    trust in MMOs
   Similar affordances and contexts (with respect to the offline
    world) lead to similar outcomes in the online world
   Explored and expanded the scope of trust related
    prediction tasks
   Analysis is required on more datasets for generalizability




                                                              127
Acknowledgement
   DMR Lab
       Professor Jaideep Srivastava and all DMR lab members especially
        Zoheb Borbora, Amogh Mahapatra, Young Ae Kim, Nishith Pathak,
        Kyong Jin Shim and Nisheeth Srivastava
   Virtual Worlds Observatory
       Professor Noshir Contractor (Northwestern), Professor Scott Poole
        (UIUC), Professor Dmitri Williams (USC)
       External Collaborators at Northwestern, UIUC, USC, U Toronto
        especially Brian Keegan
   Funding Agencies:
       NSF, AFRL, NSCTA, IARPA




                                                                       128
Publication Summary

Publication Type    Total   Thesis Related          Comment
Book                 1            1          Book on Clandestine
                                             Behaviors and Networks
                                             (Springer 2012)

Conference           14           5
Papers
Book Chapters        1            -
Journal Papers       2            -
Workshop Papers      10           3          2 Best Paper Awards
Short/Poster         8            1
Papers
Technical Reports    4            1
Tutorials            4            1
27 Co-authors
Patents              2            1
                                                                      129
A Computational Model
                    for
      Social Influence
Objective: Model social
influence in a dynamic network
setting as a spread of cascades
to identify key influential nodes



                                    131
Outline

   Team Overview
   Optimal Target Selection
   Current approaches
   Proposed Approach
   Results
   Update summary
   Q&A




                               132
Motivation




                                                               Original
                                                               Retweet


    Global retweets of Tweets coming from Japan for one
                 hour after the earthquake
                                                                 133
  Courtesy:http://blog.twitter.com/2011/06/global-pulse.html
Optimal Target People Selection
   Goal: Find the optimal targets to maximize influence
    spread in network
   Why is it important?
       Public Polls: Sentiment spread
       Sales and Marketing: Word-of-mouth spread
       Public Health: Disease spread
   Influence: A force that attempts to change the opinion or
    behavior of the individual
   Influence is causal
       My friend buys a product -> I buy a product



                                                           134
Current Approaches (1/2)

   Assume a certain model for propagation of influence [1]
       Independent Cascade Model
           Each node is independently influences the neighbor
            using a biased coin toss
       Linear Threshold Model
           Each node picks a random threshold
           Each node infects the neighbor by a specified
            amount



             All of them need to know the
                propagation probability                       135
Current Approaches (1/2)

   Use a two step approach
     Step 1: Choose influence Model

     Step 2: Find the subset of top-k nodes that maximizes
      the influence
   Bad News: Step 2 is NP-Hard [2]
   Popular method for step 2: Greedy Heuristics
     Greedy heuristics gives (1-1/e) approximation

   Address scalability issues in the second step [3][4][5][6]
     CELF [3]: Influence maximization function follows law
      of diminishing returns (sub-modular)

                                                             136
Limitations of current approaches


   Estimation/Learning of propagation
    probabilities
       Data-driven and model free approach
   Propagation models are not content
    unaware
       Content-aware influencer mining
   No causality effect in influence model
       Add causal effect in influence
        propagation
                                              137
Proposed Approach


Communication
   Logs


                Frequent     Network
                Sequence   Discovery of
                 Mining    Influencers




                                          Ordered list of
                                           influencers

  Network
  Structure
                                                  138
Sequence Cascade Mining

                                              Example


                 Temporal order
                                    Append neighbors from network i=2



                                    Check downward closure and
                                    support count
Peter, Charles, Kim, Vicky, Nancy
       Network structure


   Let Support = 2
                                                                        139
Sequence Cascade Mining

                                              Example



                                    Append neighbors from network i=3



                                    Check downward closure and
                                    support count
Peter, Charles, Kim, Vicky, Nancy
       Network structure


   Let Support = 2
                                                                        140
Sequence Cascade Mining

                                              Example



                                    Append neighbors from network i=4


                                    Break loop; No more work to do…

Peter, Charles, Kim, Vicky, Nancy
       Network structure
                                      Return Frequent Sequences

   Let Support = 2
                                                                        141
Sequence Cascade Mining



                   L1 Frequent Sequence Mining




                   Candidate Generation for k+1
                   By appending neighbors

                   Downward closure pruning

                   Support Count and reorganize




                                              142
Influence Function
   Node pattern Influence Set (Q(i,p))
                    =
   Influence Set of a Node

   Influence Set of a Node Set




    It can be shown that influence function I(V,s) is
              monotone and sub-modular
                                                        143
Network Discovery of Influencers




  Greedy heuristic with (1-1/e) approximation
                  guarantee                     144
Network Discovery of Influencers

Frequent Sequences

                                       Example
                                                            i=1
                              Choose the node with max out-degree
                Find top-3    (Randomly break tie between P
                influencers   and C)
            P                   C chosen
                              Current Set of Influenced People
   C                 N
                                   C     K     P

        K       V
                                                              145
Network Discovery of Influencers

Frequent Sequences

                                   Example
                                                       i=2
                         Choose the node with max out-degree



            P              P   chosen

                         Current Set of Influenced People
   C                 N
                               C    K     P     V     N

        K       V
                                                          146
Network Discovery of Influencers

Frequent Sequences

                                   Example
                                                       i=3
                         Choose the node with max out-degree
                         (Randomly break tie between K, V,
                         N)
            P              K    chosen

                         Current Set of Influenced People
   C                 N
                               C    K     P     V     N

        K       V
                                                          147
Context-Aware Influencers


                                         Context Specific
                                           influencers



 Bag words     Frequent     Network
representing   Sequence   Discovery of
 the context    Mining    Influencers




                                                 148
Initial Results (1/2)




       DBLP Dataset             USPTO Dataset


Significant performance gain over established
             baseline PrefixSpan                149
Initial Results (2/2)



DBLP
Dataset




USPTO
Dataset


                        150
Top Influencers

DBLP                                                                     USPTO
                                                               *
Dataset                                                                  Dataset
Thomas Huang                                                    Dieter Freitag
       Professor, UIUC                                          CTO, FRX Polymers
                                                                American Plastics Hall of Fame




       Philip Yu                                               Akira Suzuki
       Professor, UIC                                          2010 Nobel Prize winner
                                                               In Chemistry


                                                               *

   Elisa Bertino
     Professor, Purdue                                         Wilhelm Brandes
                                                               Scientist, Bayer Crop Science
                                                                                                 151


* Prolific inventors 1988-1997 as per USPTO announcement [7]
What‟s next?
   Short Term
    o   Implement NDISC algorithm
    o   Verify the influence spread of NDISC compared to popular
        baselines (PMIA, DegreeDiscountIC, SPM, and SP1M)
    o   Qualitatively analyze the influencer results compared to
        baseline algorithms
   Long Term
    o   Need to extend the approach for dynamic time evolving
        content and network structures
    o   Need implement influence analysis for streaming
        algorithms
    o   Modeling of influence propagation in collaborative networks
        using Hypergraphs
    o   Need to do this for multi-relational graphs
                                                                 152
Overall Summary
                                                        Goals
    Optimal Target People Selection       Optimal target people selection using
               Task 3.3                    model-free influence models
                                            Develop a content-centric
                                              approach for sequential cascade
              UMN Team:                       mining
                                            Develop greedy heuristics for
     Karthik Subbian (Reseacher)              influencer mining from cascades
     Jaideep Srivastava(Faculty)          Tracking the influence of target
                                           nodes using online algorithms


            Novel Ideas                            Future Plans
   Optimal Target Selection               Short Term
                                             Implement NDISC and compare with
      Content-centric                        baseline algorithms
      Mining Sequential cascades in       Long Term
       network                               Finding influencer in dynamic time
      Finding influencers from               evolving network structures
       cascades                              Modeling of influence propagation in
                                              hyper-graphs and multi-relational
      Discover context specific              graphs
       influencers
Course Outline
• Module 1
   •        Introduction to Social Analytics – applying data mining to social computing
            systems; examples of a number of social computing systems, e.g.
            FaceBook, MMO games, etc.
• Module 2
   •        Computational online trust
   •        Identifying key influencers
   •        Information flow in networks
• Module 3
   •        Analysis of clandestine networks
   •        Katana - game analytics engine




 4/9/2013                                  University of Minnesota                   154
Untangling Dark Webs
Theories, Methods, and Models for a
Computational Social Science of
Clandestine Networks




                                      155
Defining Clandestine Behavior
   Clandestine behavior is socially and culturally constructed.
    There is no computational definition of what constitutes
    clandestine behavior
   “Kept secret or done secretively, esp. because illicit”
    (Webster‟s Definition)
   Clandestine network thus involve actors and/or activities
    which are illicit in nature
Kevin Bacon Linked to Al-Qaeda !




Just because two things are related does not mean that they have a concrete
relationship. Unless …..
Clandestine organizations as
networks
   Networks are more flexible organizational forms than
    markets or hierarchies
    [Powell 1990; Podolny 1998; Brass, Galaskiewicz, et al. 2004; Robins 2009]

   Criminals are embedded within organizations
    supporting division of labor and specialization
    [Cressy 1972; Canter & Alison 2000; Waring 2002]

   Trust relations mediate functional relations like
    grouping, exchange, & communication
    [McIntosh 1974; von Lampe 2004]

   Balancing security vs. efficiency; time-to-task; resilience
    vs. flexibility
    [Milward & Raab 2003, 2006; Morselli, Giguere, & Petit 2007]
Contrasting network
assumptions

   Stohl & Stohl (2005)

    Current assumptions                        Network theory assumptions
1 Networks are information systems             Networks are multi-functional
                                               communication systems

2 Networks have uniplex ahistoric relations    Networks have multiplex and dynamic
                                               relations
3 Networks are hierarchically organized, C2 Networks are temporary, dynamic,
  structures                                emergent, adaptive, flexible

4 Boundary specification is a political tool   Boundary specification is an analytic tool

5 Networks are globalized and homophilous Networks are local, glocal, global, and
                                          heterogeneous
Information systems?
   Network ties aren‟t just for sharing information and
    resources, but also represent latent relationships
   Networks are not a machine to be broken, but a social
    organism that can adapt and reproduce
       What are underlying processes that govern how networks evolve
        and maintain their structure?
   Networks are structures for sensemaking & socialization
       Bomb-making websites easy to disrupt but message boards which
        foster communication and solidarity among people with similar
        sympathies and values much less so
Uniplex, ahistoric relationships?
   Essential covert relationships like trust & knowledge exchanged based
    on multiplex ties and attributes
       Shared background, common identity, family ties, reciprocity
       Jihadi cells prevalent in Montreal, London, Madrid, Hamburg because of
        poor social integration & high disaffection
   Longitudinal data necessary to measure changes over time
       Repeated measurements of the same network actors, ties, attributes
   Collection problems:
       Left censoring: important changes happened before data collection
       Right censoring: data collection does not last long enough to capture important
        changes
       Boundary specification: Omitting important actors, ties, attributes
       Cognitive biases: Over or under-recall
       Instrumentation: measurement inconsistencies, panel conditioning & attrition,
        missing data
Hierarchies?
   Licit and illicit organizations are no longer hierarchical pyramids with
    clear chains of command
       Network identities and roles are not fixed and constant
       Networks do not operate according to formal & unambiguous rules
   Cells: Operations can be performed spontaneously & independently
       Individuals may identify with an organization, but are not part of it
   Networks operate at multiple levels
       “Organizations created out of complex webs of exchange, dependency, reciprocity
        among multiple organizations” (Monge &Contractor 2003)
       Political parties, bureaucrats, businesses, charitable organizations, clinics, schools, &
        houses of worship are legitimate peripheral actors which enable covert organizations
Easily specified?
   Specifying rules for including/excluding actors in network membership
    is essential part of research design
        Family members, politicians, businesses, charitable organizations, …
        Unit of analysis: individuals, teams, organizations?
   Example: Snowball sampling
        Identify initial set then their links to second degree, third degree, etc.
        Missing crucial seed actors, time intensive, super-abundance of unreliable data,
         “everyone” connected by 6 degrees of separation
   Labels are fuzzy & prone to political framing
        “Terrorist vs. freedom-fighter”: UK vs. IRA? China vs. ETIM? Russia vs. Chechnya?
         Afghanistan/Pakistan vs. Taliban? Iraq/Turkey vs. Kurds?
   Distinguish between being in contact vs. being operational
        Ability to mobilize, control, coordinate members
        Look for how organizations cleave when they are forming or being changed
Global & homophilous?
   Terrorist networks encompass very different ideologies and goals
       Religious (salafism), ethnic (Kurds), and nationalist (ETA) movements
       Coercive bargaining (IRA, PLO, ETA) vs. war-inducing (AQ)
   Different motivations leads to different formation processes &
    structures
       Hamas, Hezbollah, ETA emphasize shared background/ethnicity vs. “movements”
        like al Qaeda or militias committed to shared ideology
       Relational homophily drives creation of self-similar strong ties which remain dormant,
        difficult to identify & enter, easy to disassemble
       Ideological homophily drives creation of single-issue ties which remain salient, more
        permeable boundaries means easier to enter but harder to prevent re-formation
Criminal networks literature
   Balancing security vs. efficiency; time-to-task; resilience vs. flexibility
        Milward & Raab 2003; Morselli, Giguere, Petit 2007

   Secrecy-oriented networks emphasize sparse, decentralized networks to avoid
    detection
        Erickson 1981

   Decentralization a common response to authorities‟ targeting and seizure
        Morselli & Petit 2007

   Leaders on periphery (low closeness) to avoid detection
        Baker & Faulkner 1993

   Drug traffickers exhibit higher centralization in core of participants with stable
    roles, but insulate by adding participants to extend periphery of core
        Morselli, Giguere, Petit 2007; Dorn, Oette, White 1998




                                                                                    165
Covert networks are dynamic
    networks
   Multiple types of relationships
       Familial ties, communication, exchange, authority, & other latent relationships
       Actor-actor, actor-attribute, actor-event networks
       Individuals central in one network are peripheral in other networks
   Positions and relationships are not stable
       Removing links & nodes can alter pattern but does not address underlying processes
        that govern how networks evolve, maintain, dissolve
   Outwardly different networks share common structural properties
       Faust & Skvoretz (2002): Senate co-sponsorship most similar to cow-licking
   Outwardly similar networks generated by different processes
Cutpoints & bridges

   Brokers are excellent targets
    to disassemble networks
   Cutpoint
       Removing a node creates a new
        component
   Bridges
       Removing a link creates a new
        component
Clandestine networks are
complex
   Multiple types of relationships [Stohl & Stohl 2007]
       Networks have multiplex and dynamic relations – trust, exchange,
        communication, authority, and other relationships
       Networks are temporary, dynamic, emergent, adaptive, flexible
       Networks are local, glocal, global, and heterogeneous – different ideologies,
        motivations, goals lead to different structures & processes

   Descriptive network analysis does not address
    underlying processes of how networks emerge, stabilize,
    dissolve
   Goal: To disrupt network, understand and attack the
    processes which create and stabilize
Modeling Clandestine
       Organizations
   and Behaviors as
           Networks
Introduction to network analysis

   Networks are sets of nodes
    connected by links
   Nodes can be people, groups,
    webpages, etc.
   Links can be friendships, exchanges,
    affiliations, etc.



           A          B
Networks - Directed & undirected
   Communication vs. friendship networks


     A        B      C            A         B   C




         D           E                D         E




          F          G                F         G
Degree distributions

   What‟s the probability P(k)
    of randomly selecting a
    node with degree k in this
    network?


       24/31
    P(k)




       6/31
       1/31
               1   4        12
                       k
Power laws
                                                   Internet routers   Movie actors
•   Large networks can have
    degree distributions that span

•
    several orders of magnitude
    Many real world networks follow
                                                         ᵞ                      ᵞ
    a power law degree
    distribution
    •   Scale free networks, 80/20 rule,
        Pareto principle, Zipf‟s Law, long tail,
        etc.


                                                           ᵞ                 ᵞ
                                
         P(k ) ~ k
                                                   Physicists         Neuroscientists
Deg distributions across
networks
Path length
   Path length: number of links       A       B
    between two nodes (degrees
    of separation)                 C       D
       BACDE = 4
   Geodesic: Shortest path            E
    length between two nodes
       BAE = 2
                                           F
   Diameter: Network‟s largest
    geodesiceccenctricity
       BAEFGHBACJIH
                                           G       H

                                               I

                                           J       K
Density, clustering, centralization
   Density                                                      A
       Observed edges in network / maximum possible
        edges                                                B       C
   Clustering
       Count ties among alters, removing ego and ties to        D
        ego
       Observed ties in actor‟s ego network / maximum
        possible ties in ego network                             E
   Network centralization
       Variation in individual actors‟ centralities             F
       High centralization when few actors possess higher
        centrality than average
                                                             G       H
       Low centralization when actors all have similar
        centralities
                                                                 I
Paths & clustering across
networks
Small worlds
   Paradox: Individuals within the
    network are highly clustered but
    also have small average
    geodesics to other members
   Randomly rewiring a fraction of
    links on a regularly-clustered
    network drastically shortens
    average eccentricity
   Random rewiring, however
    still maintains high clustering
    over several orders of
    magnitude
Degree centrality
   Degree: total number of links with other actors
       In-degree: Directional links to actor from other actors
       Out-degree: Directional links from actor to other actors
   “Popularity”

                            C

                        A       F

                            D       H      I     J

                        B       G

                            E
Closeness centrality
   How easily one actor can reach rest of network
   Actor with shortest average path length
   “Pulse-taker”



                          C

                      A       F

                          D       H     I     J

                      B       G

                          E
Betweenness centrality
   How much an actor lies between distinct groups
   Number of geodesics passing through actor
   “Broker”



                         C

                     A       F

                         D       H     I     J

                     B       G

                         E
Multi-Dimensional Networks -
Attributes

   Selection: Immutable characteristics
       Race, ethnicity, gender, etc.
       Simple process: Attributes remain fixed and influence how
        connections are formed
       Homophily
   Influence: alterable characteristics
       Interests, activities, infectiousness, etc.
       Complex process: Feedback and interactions between ego‟s
        attributes, network structure, alters‟ attributes
       Diffusion and coevolution
Selection and homophily
   Existing attributes drive creation & destruction of
    connections
   “Birds of a feather flock together”



          A       B       C               A       B       C




              D           E      Time         D           E




              F           G                   F           G
Influence and diffusion
   Existing connections drive creation & destruction of
    attributes


          A       B      C              A       B          C




              D           E     Time        D              E




              F          G                  F              G
Multi-relational
   Organizations: authority, trust, & friendship




                     A         B       C




                          D            E




                          F            G
Triad census & network motifs
    16 possible triads typesmotifsisomorphisms in a directed network
          (Reciprocated ties, unreciprocated ties, null ties)
    Triad census: frequency of each structure in a network
          Compare frequencies in observed network, measuring deviations against
           frequencies in random networks
    Simmelian ties: Transitive-reciprocated triads (3-0-0) occur appear
     more frequently than any other motif in social networks




    003          012        102        021D        021U          021C   111D   111U




    030T         030C       201        120D        120U          120C   210    300
Network families
Growth & preferential
attachment
   How do you generate scale-free networks?
       Rich get richer (Yule 1925, Simon 1955, de Solla Price 1976, Albert & Barabasi 1998)

                        ?
Data collection issues
   Data on clandestine organizations are doubly hard to
    obtain
       Clandestine networks by definition seek to avoid detection or identification
       Law enforcement prohibited from collecting some types of data, reluctant to disclose
        extant data to prevent adaptation

   Criminal network studies to date generally rely on:
       Evidence entered into court proceedings   [Sparrow 1991; Baker & Faulkner 1993]
       Imputation from secondary or tertiary sources [Krebs 2002; Sageman 2004]
Problems with approaches
   Collection of incomplete network
    data seriously compromises
    reliability of findings [Wasserman & Faust
    1994]

       Boundary specification of nodes –
        peripheral & legitimate actors often
        omitted despite playing crucial roles
       Censored data – only one type of
        interaction recorded; earlier & later ties
        may be impossible to capture
       Lack of attributes – occupation, gender,
        affiliation, personality, psychological
        states strongly influence tie-formation
        behavior over time [Robins 2009]




                                                     Krebs 2002
Introduction to MMOGs,
      Gold Farming, and
            Everquest II
MMOGs
   Massively Multiplayer Online Games (MMOGs)
   Shared online persistent virtual environments where
    millions of people can interact with one another
   Players can complete quests, interact with the game
    environment, interact with other players
   Many of the behaviors which are observed in the real world
    are also observed in MMOGs e.g., friendship, economic
    behavior, backstabbing, illicit behaviors etc
Gold farming


• Gold farming and real money trade
  involve the exchange of virtual in-game
  resources for “real world” money

• Laborers in China and S.E. Asia paid to
  perform repetitive in-game practices
  (“farming”) to accumulate virtual wealth
  (“gold”)

• Western players purchase farmed gold to
  obtain more powerful items/abilities and
  open new areas within the game

• Market for real money trade exceeds $3
  billion annually [Lehdonvirta & Ernkvist 2011]


                                                   194
Deviance
   Game companies actively ban accounts involved in
    gold farming operations
   Why?
       Upsets game economy equilibrium by inflating prices
       Excludes other players from shared game environments
       Automated bots/scripts ruin social interactions
       Pay-to-play upsets meritocratic expectations
       Theft of billing or account information
       Legal ramifications of virtual items as “property”




                                                               195
Identification
   Most gold farmers are caught by:
       Other players reporting behavior
       Farmers’ solicitations and spamming
       Organized “sting operations”
       Administrator heuristics
   Farming operations employ highly-specialized
    operations that have to balance practices to efficiently
    accumulate gold with practices to avoid detection
Changes in ban reasons, all




                              197
Dynamics, bursts, lifecycle




              Ninja Metrics confidential information. Copyright 2012
Mapping
   Gold farmers potentially operate under similar motivations
    and constraints as other clandestine or criminal
    organizations
       Profit motive – illicit goods with minimal input costs provide
        arbitrage opportunities
       Distribution challenges – suboptimal processes and
        structures for generating and distributing goods so to avoid
        risk of detection
       Selection pressures – authorities confiscate goods and detain
        participants when detected
EverQuest II

   Massively Multiplayer Online Role
    Playing Game (MMORPG, MMO)
   Data spanning from January 2006
    to Mid-September 2006
   2.1 million players across multiple
    servers
   Our analysis focuses on a subset of
    the server
   In-game action data available as
    well as social interaction data (trust,
    mentoring, questing, grouping,
    trade) etc
EverQuest II – Gold Farmers
   Gold Farmers are explicitly labeled in the dataset
   Gold Farmers constitute only a small percentage of all of
    the players (1-3%)
   Gold Farmers bots are not as common in EverQuest II as it
    is generally assumed to be the case in MMOs
Types of Gold Farmers

   There are multiple types of Gold Farmers
    •   Gatherers: Accumulate gold or other resources
    •   Bankers: Low-activity reserve accounts
    •   Mules and dealers: Single user accounts to transmit money and
        interact with the customer
    •   Barkers: Spammers marking services in the game
   Issues: The dataset however does not have labels for
    the gold farmer sub-types
   Open Question: Are there other types of gold farmers?
Descriptive properties
                   and
   Statistical network
                models
Properties of Clandestine
    Networks

   Clandestine Networks are embedded in larger networks
    and have been studied in isolation as well as being
    embedded in larger networks
   Do clandestine networks exhibit properties that distinguish
    them from normal networks?
       Behavioral Signatures
       Structural Signatures
       Spatio-Temporal Signatures
Centrality comparisons
   Compared to the population-at-large, do farmers or
    affiliates have higher or lower…
       Incoming trade relationships? (In-degree)
       Outgoing trade relationships? (Out-degree)
       Incoming transactions? (In-weight)
       Outgoing transactions? (Out-weight)
       Proximity to the rest of the network? (Closeness)
       Level of brokerage? (Betweenness)
       Higher levels of “prestige”? (Eigenvector)
       Tendency for counterparties to also trade? (Clustering)
Multinomial logit
   Compared to non-farmers, do farmers & affiliates have
    significantly different structures?


                         Farmers (z-statistic)   Affiliates (z-statistic)
In-degree                -0.0473 (-9.24)***       0.0626 (47.21)***
Out-degree                -0.189 (-5.11)***       0.0529 (44.25)***
In-weight                0.00691 (23.04)***      0.00851 (32.64)***
Out-weight               0.00796 (24.32)***      0.009556 (32.97)***
Closeness                 -0.679 (-6.38)***       -2.438 (-13.24)***
Betweenness                1.56x10-7 (1.94)      1.36x10-7 (35.81)***
Eigenvector               -10300 (-8.60)***        5377 (35.73)***
Clustering coefficient     0.905 (8.64)***          -0.0926 (-0.81)
Network characterizations
   Degree distribution
       What fraction, P(k), of nodes in the network have k
        connections?
   Weight distribution
       What fraction, P(s), of links in the network have had s
        transactions?
   Power law/scale free distributions
       80/20 rule: minority of nodes have majority of links
       Generated by growth and preferential attachment
       Linear on a log-log plot… with some interesting exceptions
Degree distribution & attenuation

                                                                       k

       1.E+00
                1                                            10                                                100



       1.E-01




       1.E-02
P(k)




       1.E-03




       1.E-04




       1.E-05




                    Farmer-In   Farmer-Out   Affiliates-In        Affiliates-Out   Non-Affiliates-In   Non-Affiliates-Out
Growth constraints

•   Power law scaling followed by                                                k

                                            1.E+00
    exponential cut-off                               1                    10                         100


    •   Aging: old nodes stop accepting      1.E-01

        new links
    •   Cost: becomes more expensive         1.E-02

        to accept new links




                                          P(k)
    •   Capacity: nodes stop accepting       1.E-03


        links above threshold
    •   Copying: new nodes imitate           1.E-04


        connections of existing nodes
                                             1.E-05




                                                      Farmer-In        Farmer-Out          Affiliates-In


                                                      Affiliates-Out   Non-Affiliates-In   Non-Affiliates-Out
Assortative v. random mixing

   Is the degree of a node correlated
    with its neighbors’ degrees?
       No: random mixing
       Yes, positive: assortative mixing
       Yes, negative: dissortative mixing
   Assortative mixing
       Well-connected nodes are connected to
        other well-connected nodes
       Poorly connected nodes connected to
        other poorly connected nodes
   Dissortative mixing
       Well-connected nodes are connected to
        poorly-connected nodes
Assortativity in context
   Dissortativity found in
    biological, ecological, &
    technological networks
    that require failure
    tolerance
   Assortativity found in
    social and collaboration
    networks
Comparative network analysis
   How does the gold farming network compare against a
    “real world” criminal network?
       Criminal network data hard to get a hold of in first place
       Problems defining boundaries, multiplex relations, etc.
       Carlo Morelli’s CAVIAR drug trafficking network (N=110,
        E=295)
Assortativity

                                      Normal players and farmer affiliates both adopt
                                           collaborative interaction structures



          10
 Knn(k)




           1
                1                                       10                                    100


                                      Gold farmers and drug traffickers both adopt
          0.1                                avoidance interaction structures
                                                                  k

                    Affiliates-InIn           Affiliates-OutOut       Non-Affiliates-InIn   Non-Affiliates-OutOut

                    Farmer-InIn               Farmer-OutOut           Caviar-InIn           Caviar-OutOut
Conclusions – Assortativity
   Non-affiliated players exhibit assortativity
       Collaboration > Resilience
   Affiliate network generally assortative
       Collaboration > Resilience
       High degree outliers likely unidentified farmers
   Farmers’ network is dissortative
       Collaboration < Resilience
   Drug trafficking network similarly dissortative
       Collaboration < Resilience
Attack & failure tolerance
   Failure: random removal of nodes
   Attack strategies
       Degree attack: removal of best-connected nodes
       Edge attack: removal of high-transaction dyads
   Outcomes
       Fraction of remaining nodes in largest connected component
       Fraction of remaining nodes as isolates
Study 2 - Attack & failure analysis

              1

             0.9

             0.8

             0.7

             0.6
  Fraction




             0.5   Degree attack fractures farming network
             0.4   faster than random failure or edge attack
             0.3

             0.2

             0.1

              0
              0.01%             0.10%                  1.00%                10.00%                100.00%
                                        Node fraction removed
 Degree attack - LCC                     Degree attack - Isolate fraction     Random failure - LCC

 Random failure - Isolate fraction       Edge attack - LCC                    Edge attack - Isolate fraction
Comparative attack & failure analysis
– LCC fraction
              1


             0.9


             0.8


             0.7


             0.6
  Fraction




             0.5


             0.4


             0.3


             0.2


             0.1


              0
               0.01%                              0.10%                                   1.00%                              10.00%                             100.00%


                                                            Node fraction removed




                   Farmer attack - LCC fraction           Farmer failure - LCC fraction           Caviar attack - LCC fraction        Cavaiar failure - LCC fraction
Comparative attack & failure analysis
– Isolate fraction
              1


             0.9


             0.8


             0.7


             0.6
  Fraction




             0.5


             0.4


             0.3


             0.2


             0.1


              0
               0.01%                   0.10%                           1.00%                             10.00%                              100.00%


                                                    Node fraction removed



             Farmer attack - Isolate     Farmer failure - Isolate              Caviar attack - Isolate            Caviar failure - Isolate
Conclusions – Attack tolerance
   Edge attack strategy has poorer performance than
    even random failure
   Gold farmers & drug traffickers respond similarly to
    degree attack
Month 1




          220
Month 2




          221
Month 3




          222
Month 4




          223
Month 5




          224
Month 6




          225
Month 7




          226
Month 8




          227
Month 1 in Month 5




                     228
Month 4 in Month 5




                     229
Month 5 in Month 5




                     230
Month 6 in Month 5




                     231
Month 8 in Month 5




                     232
Predictive Modeling and
      Machine Learning
            approaches
Gold Farmer Detection
   How does one catch gold farmers?
   Are there characteristics which can
    be used to distinguish gold farmers?
   What attributes can be used to detect GFs?
       Player Character Attributes: Gender, Class etc of the player
        character
       Player Activity Data: Player activities over the course of time e.g.,
        number of quests, type of quests, NPCs killed?
       Player Socialization Data: Grouping others, trust, mentoring
       Player Demographics: Real world age, gender, location
Machine Learning Approaches
   Multiple ways to catch gold farmers:
   Binary Classification Problem
       Multi-class Classification
       One class Classification problem
       Cascading Classifiers Problem
       Outlier Detection
       Label Propagation Problem
       Combination of these
   Class Labeling Issues: Not all players who are labeled as
    normal players are normal players
Machine Learning Binary Classification
Approach

   Two main classes: GFs and non-GFs
   Highly Skewed Distribution
       9,178 Gold Farmer Characters out of a total of 2.1
        million characters
   Standard ser of combinations of classifiers and
    features e.g., Naive Bayes, Bayes Net, Logistic
    Regression, KNN, J48, JRip, AdaBoost and SMO etc




                                                             236
Feature Set

•   Demographic Features*
•   Performance Features
•   Task distributions (set of tasks performed)
•   Sequence of activities performed by gold farmers
•   Examples: KKKdDKdEESSKD, SSSEKdKdDD
    –   K= Killed Monster, d = damage points, D = Character Death, S =
        Completed a recipe e.g., spell
* All information is annonymized.




                                                                     237
Sequence Patterns




                    238
Binary Classification Approach:
Classification Models




                                  239
Initial Classification Results




         Frequent Pattern Mining Association
Label Propagation

   Research Problem: Not all people who are labeled as
    normal players are such. Some of them are gold farmers
    but have not been identified as such
   Analogue: Not all people who are free i.e., not in jails
    are innocent
   Solution: Use label propagation to label people who
    may be gold farmers
   Analogue: If someone is socializing almost exclusively
    with criminals then he may be a criminal
Label Propagation




 (a) Classification Results   (b) Social Networks within
                                   class boundaries
Label Propagation

                             • Guilt by association
                             • Propagate labels based on
                             the social neighborhood and
                             socialization patterns of players
                             • If player A spends > 80% of time
                             socializing (grouping, trusting,
                             mentoring other gold farmers then
                             he is likely a gold farmer
                             • Alternative methods: Propagation
                             based on similarities




  Propagate Labels based on the Social Networks of the Gold Farmers
Results

        Classifier       Metric Initial Dataset   Label Prop   Change In Performance    Lift
        Bayes Net       Precision     0.17          0.189               0.019          1.11
                          Recall     0.834          0.819              -0.015          0.98
                         F-Score     0.282          0.307               0.025          1.09
             J48        Precision    0.494           0.62               0.126          1.26
                          Recall     0.189          0.337               0.148          1.78
                         F-Score     0.273          0.437               0.164          1.60
            J Rip       Precision    0.495          0.537               0.042          1.08
                          Recall     0.462          0.462                 0            1.00
                         F-Score     0.478          0.497               0.019          1.04
            KNN         Precision    0.436           0.46               0.024          1.06
                          Recall     0.396          0.428               0.032          1.08
                         F-Score     0.415          0.443               0.028          1.068
    Logistic Regression Precision    0.455          0.534               0.079          1.17
                          Recall     0.189          0.271               0.082          1.43
                         F-Score     0.267           0.36               0.093          1.35
       Naïve Bayes      Precision    0.146          0.142              -0.004          0.97
                          Recall     0.538          0.502              -0.036          0.93
                         F-Score      0.23          0.221              -0.009          0.96
     Adaboost w/ DT Precision        0.405          0.471               0.066          1.16
                          Recall     0.105           0.08              -0.025          0.76
                         F-Score     0.167          0.137               -0.03          0.82
Results Comparison
                                                          Lift (Social Comp vs
                      SocialComp’09        Label         New Features + Label
           Metric          (F0)         Propagation           Propagation)
          Precision       0.493            0.537                  1.089
            Recall        0.304            0.462                  1.520
           F-Score        0.376            0.497                  1.322


• The recall improves significantly from the previous results which implies that the
accounts that we are catching more gold farmers
• In case of Label propagation the precision increases from 0.49 to 0.54 which
implies that more of the users being identified as gold farmers by us are indeed
gold farmers
Gold Farmer Detection as a One class
classification Problem
  •   The labels of one class are known for certain
  •   The labels for the rest of the records are not known
      with certainty
  •   Use the known class for training and classify the
      rest of the records for the known class
  •   Issues: The known class consists of many
      subclasses which different feature sets




                      Ninja Metrics confidential information. Copyright 2012
Disambiguating Gold Farming
    sub-classes
   Heuristics can be used to disambiguate the different sub-
    classes
   Gatherers have high in-game intense activity associated
    with them
   Mules have high trade volume but low in-game activity
   Bankers have low level trade volume and low in-game
    activity
   Spammers have little of no trade activity (in general) but
    denser chat networks
Disambiguating Gold Farming sub-
classes
  High Intensity Gatherer

 0000 hour                                                                           1200 hours

  High Intensity Normal Player

 0000 hour                                                                           1200 hours

  Banker

 0000 hour                                                                           1200 hours
  Player with Periodic Behavior

 0000 hour                                                                           1200 hours


  Low intensity Player

 0000 hour                                                                           1200 hours


                            Ninja Metrics confidential information. Copyright 2012                248
Research Question

 “How do Gold Farmers change their
 behaviors as a consequence of game
 admin's behaviors?”

 Can we anticipate GF change in
 behaviors in advance?
Change Detection in Gold Farmer Behavior
 • Research Question: How do gold famers respond to global
 enforcement of policies by the game admins
 • The in-game activities of gold farmers can be represented as a time series
 • Applied clustering to these series and 3 clusters made the
 most sense (most clear separation)
 • Each cluster can be mapped to a gold farmer subtype
 • However 20-30% of all players
 in each cluster are non-GFs
 • Change detection to determine
 when the time series changed




                                                                                250
Interpretation: Global Changes in GF Behaviors - Adaptation
 • Gold farmers and game admins change their activities
 based on how the other acts in the game
 • Previously there was anecdotal evidence for this change,
 we have established that this happens at not just the activity
 level but at the structural pattern level
 • Can we predict how the gold farmers will act if game
 admins adopt a certain banning policy?




                                                             251
Research Question

“A plague upon‟t when thieves
cannot be true one to another!”
– Sir Falstaff, Henry IV, Part 1, II.ii




                         Do gold farmers
                         trust each other?
Housing-Trust in EQ2

 • Access permissions to in-game house
   as trust relationships
    • None: Cannot enter house.
    • Visitor: Can enter the house and can
      interact with objects in the house.
    • Friend: Visitor + move items
    • Trustee: Friend + remove items


 • Houses can contain also items which
   allow sales to other characters without
   exchanging on the market




                                             253
Hypergraphs to Represent Tripartite Graphs




   • Accounts can have several characters
   • Houses can be accessed by several characters
   • Projecting to one- or two-model data obscures crucial
     information about embededdness and paths
       • Figure 2a: Can ca31 access the same house as ca11?
       • Figure 2b: Are characters all owned by same account?
                                                                254
Hypergraphs: Key Concepts

                      •     Hyperedge: An edge between three or
                            more nodes in a graph. We use three
                            types of nodes: Character, account and
                            house
                      •     Node Degree: The number of
                            hyperedges which are connected to a
                            node
                            •   NDh1 = 3
                      •     Edge Degree: The number of
                            hyperedges that an edge participates in
                            •   EDa1-h1 = 2




                                                                  255
Approach

•   Game administrators miss gold farmers and deviance is not
    a simple binary classification task
•   Guilt by association: Identify “affiliates” who have ever
    interacted with identified gold farmers, but have not been
    identified as gold farmers themselves



             A                   B                 C

           Farmer             Affiliate       Non-affiliate
Network Characteristics




• Long tail distributions are observed for the various degree distributions
• The mapping from character-house to an account is always unique



                                                                        257
Characteristics of Hypergraph Projection Networks
 • Account Projection: Majority of the gold farmer nodes are isolates
   (79%). Affiliates well-connected (8.89) vs non-affiliates (3.47)
 • Character Projection: Majority of the gold farmer nodes are isolates
   (84%). Affiliates well-connected (10.42) vs non-affiliates (3.23)
 • House Projection: 521 gold farmer houses. Most are isolates (not
   shown) but others are part of complex structures. Densely connected
   network with gold farmers (7.56) and affiliates (84.02)




                                                                    258
Key Observations

          • Picky picky: Gold farmers grant trust ties less frequently than
            either affiliates or general players
          • Gold farmers grant and receive fewer housing permissions
            (1.82) than their affiliates (4.03) or general player population
            (2.73)




                        Total degree                     In degree                     Out degree

                 <n>      < nGF >      < nAff >   <n>     < nGF >    < nAff >   <n>     < nGF >     < nAff >

  Farmers        1.82       0.29        1.82      0.89     0.29       0.89      1.07      0.29       1.07

  Affiliates     4.03       1.28        0.70      1.55     0.75       0.70      2.88      0.63       0.70

Non-Affiliates   2.73        -          7.77      1.57       -        5.98      1.56       -         2.34


                                                                                                    259
Key Observations

          • No honor among thieves
                 • Gold farmers also have very low tendency to grant other
                   gold farmers permission (0.29)
                 • Affiliates also unlikely to trust other affiliates (0.70)




                          Total degree                     In degree                     Out degree

                   <n>      < nGF >      < nAff >   <n>     < nGF >    < nAff >   <n>     < nGF >     < nAff >

  Farmers          1.82       0.29        1.82      0.89     0.29       0.89      1.07      0.29       1.07

  Affiliates       4.03       1.28        0.70      1.55     0.75       0.70      2.88      0.63       0.70

Non-Affiliates     2.73        -          7.77      1.57       -        5.98      1.56       -         2.34


                                                                                                      260
Key Observations

      • Affiliates are brokers:
               • Farmers trust affiliates more (1.82) than other farmers (0.29)
               • Affiliates trust farmers more (1.28) than other affiliates (0.70)
               • Non-affiliates have a greater tendency to grant permissions to
                 non-affiliates (7.77) than in general (2.73)




                           Total degree                     In degree                     Out degree

                    <n>      < nGF >      < nAff >   <n>     < nGF >    < nAff >   <n>     < nGF >     < nAff >

  Farmers           1.82       0.29        1.82      0.89     0.29       0.89      1.07      0.29       1.07

  Affiliates        4.03       1.28        0.70      1.55     0.75       0.70      2.88      0.63       0.70

Non-Affiliates      2.73        -          7.77      1.57       -        5.98      1.56       -         2.34


                                                                                                       261
Frequent Pattern Mining: Key Terms

Market Basket Transaction        t1:   Beer, Diaper, Milk
                                 t2:   Beer, Cheese
dataset example
                                 t3:   Cheese, Boots
                                 t4:   Beer, Diaper, Cheese
                                 t5:   Beer, Diaper, Clothes, Cheese, Milk
                                 t6:   Diaper, Clothes, Milk
                                 t7:   Diaper, Milk, Clothes
•   Items: Cheese, Milk, Beer, Clothes, Diaper, Boots
•   Transactions: t1,t2, …, tn
•   Itemset: {Cheese, Milk, Butter}
•   Support of an itemset: Percentage of transactions which
    contain that itemset
•   Support( {Diaper, Clothes, Milk} ) = 3/7
                                                                      262
Frequent Itemset Mining for Frequent Hyper-subgraphs


    o Support of a Hyper-subgraph: Given a sub-hypergraph of size
      k, subP is the pattern of interest containing the label P, shP is a
      pattern of the same size as subP and contains the label P, the
      support is defined as follows:




    Support of pattern        also containing a gold farmer (red) = 5/8



                                                                       263
Frequent Itemset Mining for Frequent Hyper-subgraphs


    o Confidence of a Hyper-Subgraph: Given a sub-hypergraph of
      size k, subP is the pattern of interest containing the label P, subG
      is a pattern which is structurally equivalent but which does not
      contain the label P, the confidence is defined as follows:




    Confidence of pattern        and containing a gold farmer = 5/7



                                                                       264
Frequent Patterns of GFs
 • Less than 0.1 support and confidence for almost all
   (except 8) frequent patterns with gold farmers
 • Remaining 8 patterns can be used for discrimination
   between gold farmers and non-gold farmers
 • Gold farmers & affiliates are more connected: A third of
   more complex patterns (k >= 10 nodes) are associated
   with affiliates (15/44)




                                                              265
Conclusion and contributions


             Using hypergraphs to represent
                complex data structures and
                             dependencies

                Application of frequent pattern mining to
                discover distinct trust patterns associated
                with gold farmers

             No honor between thieves:
       Gold farmers tend not to trust other
                             gold farmers


                                                              266
Implications

              Social organization and behavioral
              patterns of clandestine activity as
              co-evolutionary outcomes

Using online behavioral patterns to inform
       and develop metrics/algorithms for
      detecting offline clandestine activity

               Clandestine networks as “dual use”
               technologies – ethical and legal
               implications of improving detection?
               [Keegan, Ahmad, et al. 2011]
Limitations and future work



   •   Housing/trust ties mediated by other or multiplex
       relationships
       •   Communication, grouping, mentoring, trading, etc.

   •   Multiple types of deviance and deviants: Modeling
       role specialization & division of labor
   •   Using frequent subgraphs patterns as
       discriminating features for ML models
   •   Changes in frequent subgraphs over time



                                                               268
Contraband; Online and Offline


   o Contraband are illegally obtained items
     constituting a parallel or shadow economy
     which evade regulation or taxation.



              o Governments try to interrupt such exchanges especially
                when they involve dangerous items like weapons and
                drugs
              o Extremely difficult to obtain data about contraband. Thus
                analysis is limited
              o Online analogues of such behaviors offer the possibility
                of analyzing such behaviors and closing this gap

                                                                     269
Research Questions

 oWhat do the trade networks of gold
  farmers & normal players look like?
                oDo gold farmers exhibit distinctive
                 behavioral patterns for buying and
                 selling items?
oWhat are the characteristics of
 contraband networks in MMOs?
oCan we use contraband
 networks to catch gold farmers?
                                                   270
Consignment Trade in EverQuest II




o Gold farmer trading activity is a significant fraction of the trading activity
  and then it significantly decline
o Possible Explanation: (i) Real decline in gold farming activity (ii) Gold
  farmers change their strategies to evade detection
                                                                              271
Consignment Trade in EverQuest II




                                    272
Gold Farmer Trading Items




• Even though the trade volume of gold farmers decreases over
 time, the total number of unique items traded by them
remains constant
• This implies that there is a subset of items that gold farmers
are interested in buying and selling
• Specialized items have a low trade volume




                                                                   273
Frequent Pattern Mining for Contraband and Trade
Network

     oMain Idea: There certain behaviors and/or
      activities that are associated more with gold
      farmers as compared to other people
     oOnce identified these can be used as feature
      sets to build models to classify people as
      deviant vs. non-deviant
     oWe analyze the social
      networks as well as the
      item-usage networks of gold farmers
Item-Projection Networks and FPM Framework


    oTwo mode network of players and items sold
    oProjection network of items: An edge is
     created between two items if they have been
     traded by the same person
    oWhat are the items which are traded by gold
     farmers as compared to others?
            Player X
                                   Item A
                Item A

                                   Item B
           Item B
Frequent Items Associated with Gold Farmers
Characteristics of Items associated with Gold Farmers

     o With normal player a range of items types are associated with buying a
       selling activity
     o Surprisingly not only are certain items associated with gold farmer
       selling activity but also buying activity
     o The items that gold farmers buy are usually low end items (i) Used for
       crafting (ii) Cornering the market?
     o The items that are sold by gold
       farmers are usually high end items
       and in many cases almost
       exclusively sold by them
Construction of Item-Networks

 o Apply standard frequent-pattern mining techniques (Apriori, FP-Tree) to
   determine frequently traded items
 o For all the items which occur frequently create an edge between them
 o For items which are sold in different transactions by the same people then
   also form an edge between them
Examples of Extracted Item Network
Frequent Subgraphs as features


    oMain Idea: In addition to the features based on
     player characteristics use sub-graphs as
     features
Prediction Models
Model   1 (Player Attribute Based Features): These features are based on the attributes of the player‟s
character in the game e.g., character race, character gender, distribution of gaming activities etc. These
are the same features which were used by Ahmad et al [SocCom‟09].
Model 2 (Item Based Features): These are the features which are derived from items bought and sold
from the consignment network. These features are based on the frequency of the frequent items sold or
bought by gold farmers.
Model 3 (Player Attribute & Item Based Features): All the attributes from the previous two models.

Model 4 (Item Network Based Features): Features which are derived from the item network in a
manner analogous to Model 2.
Model 5 (Player Attribute & Item-Network Based Features): A combination of features from Model 1
and Model 4.
Model 6 (Item Network & Item-Network Based Features): A combination of features from Model 2 and
Model 4.
Model 7 (Player Attribute, Item & Item-Network Based Features): Union of all the features described
above.
Results




A set of standard machine learning models are used: Naive Bayes, Bayes Net,
Logistic Regression, KNN, J48, JRip, AdaBoost and SMO
Conclusion

 o Described a phenomenon analogous to contraband in the offline world
 o Analysis of gold farmer item networks reveal that they exhibit characteristics
   which are different from that of normal players
 o Items that gold farmers buy are usually low end items and items that they sell
   are often high end items
 o One can use frequent items and networks
   of such items associated with gold farmers to build classifiers which can be
   used to catch gold farmers




                                                                                  283
Future work
   Association rule learning for temporal patterns  bursts of
    activity predict gold farming?
   Statistical modeling of sudden changes and system
    responses in network over time
   Expanding hypergraph approaches to representing
    complex relationships
   Comparative analysis against EVE Online, other games
    with similar “gold farmers get banned” regimes
Some ethical quandaries
   Euphemisms abound: “removing” links and nodes
       Should scholars be engaged in “destructive science”?
   Clandestine network analysis as dual use technology
       “Terrorist vs. freedom-fighter”
       Used for good (MENA revolutions), evil (al-Qaeda), unclear (Wikileaks)
   Legal dimensions of information theory and methods
       Different assumptions in model & methods foreground different suspects
       Minimize false positives or false negatives? Maximize true positives or true
        negatives?
Legal questions
   If moral, ethical, and legal boundaries that constrain
    behavior are indistinct or unenforceable in virtual world,
    who defines regulations to define and curb excesses?
       Some normative rules hard-wired into code, others permitted by code
        [Lastowka & Hunter 2006; Grimmelmann 2006; Ondrejka 2006]

   From false positives to due process
       Responsibility to disclose proprietary methodological approaches?
       Heightened or different burden of proof given superabundance of data?
       Expectations of privacy?
       Demonstrating intent?
       Right to representation and due process?
Katana:
A Game Analytics Engine
Analytics
Architecture
                                    MMOG/VW                      MMOG Vendor


                                                 Free Websites
   Game
 Knowledge
                                                                   Actionable
                                        Game
                                                                    Insights
                                       logs @
                 Data Piped from
                 partner               partner

                                   Analysis Engine

                              HADOOP – Cloudera Release
                                 RDB - MSSQL Server


   3rd Party
    Custom
    289
Data/Axciom/TR
        W
Analytics Pipeline
                         Game
         Sony             Data
                                                  Domain
               CR3                               Knowledge

                       …



                                                                                                      Katana Analytics Engine (UI)
                                                                                                       • Java applet embedded in web browser
             Data                                                 Scheduled                            • Stand-alone deployable Java applet

            Modeling                                              Pipeline Flow


                       • Data cleaning
          Tera         • Data transformation
          bytes        • Normalization
                       • Loading
                                                             Analytics Engines
                                                   Churn                Gold                Network
             Data                                 Analysis             Farming               Value
                                                                                                                         Data
           Warehouse                                                                                                     Mart
                                                         Standalone Java applications
                                                                                                             MS SQL Server 2008
          Hadoop Cluster                           •Alienware workstations                              • Dell Precision 1500 (Workstation)
• 4 Units of Dell PowerEdge R510 (rack server)     • 2 TB hard drive, 8 GB memory, 2 Dual core CPUs     • 1.5 TB hard drive, 4 GB memory, 2 Dual
• Each with 14 TB hard drive, 12 GB memory, 2                                                           core CPUs
CPU x 8 parallelization = 16 CPUs
          290
Google: Aspiring to know all
  aspects of your social life
Social Networks




 Social Multimedia
 Networks



Text-based Conversation
Networks
Data Collected, Benefits, Impact
   Data collected
     Activity logs: Clicks, page visits, chats, friends, uploads,
      downloads, tags, comments, responses, joining and leaving of
      communities, people friended, people blocked, ads clicked on,
      ads ignored, frequency of logging in.
     Network Logs: What are people‟s activities when I tag,
      comment, respond, stay idle, post, recommend and ignore?
     Basically: „Too much data about you‟

   Benefits
     Advertising: Easiest way to spread an infection by tapping key
      nodes in Google's network
     Sentiment Analysis: Detect how your product or service is
      doing
     Etc., etc., etc., etc., basically too many to list

   Impact
     Tremendous

     Also very scary!
Facebook: The operating
system for your social life,
                 and more
Facebook
   > 950 million users for Facebook
       >14% of humanity!!
       >35% of all who have computers!!!
   > 50% of Facebook users log on every day
    (http://www.facebook.com/press/info.php?statistics)
   spending an average of 14 minutes per day
    (http://mashable.com/2010/02/16/facebook-nielsen-stats/)
   Ultimate social data, no end to what can be done with it
       No wonder FB is being pegged at $100+ billion IPO
   They know everything
       So even if Google doesn‟t scare you, Facebook should!
Impact of new instrumentation on science
   1950s
       Invention of the electron microscope fundamentally changed
        chemistry from „playing with colored liquids in a lab‟ to „truly
        understanding what‟s going on‟
   1970s
       Invention of gene sequencing fundamentally changed biology from
        a qualitative field to a quantitative field
   1980s
       Deployment of the Hubble (and other) Space telescopes has had
        fundamental impact on astronomy and astrophysics
   2000s
       Massive adoption is fundamentally changing social science
        research
       Massively Multiplayer Online Games (MMOGs) and Virtual Worlds
        (VWs) are acting as „macroscopes of human behavior‟
Some leading edge US research
programs we are participating in
   Biggest
       US Army‟s Network Science Collaborative Technology Alliance
       Led by BBN, with over 25 institutions and 150 researchers
       Approximately $200 million over a 10 year period (2009 – 2019)
       http://www.ns-cta.org/ns-cta-blog/
   A number of DARPA programs
       Social Media in Strategic Communication (SMISC)
         Started in November 2011

         http://www.darpa.mil/Our_Work/I2O/Programs/Social_Media_in_Strategic_Comm
           unication_(SMISC).aspx
       Graph-Theoretic Research in Algorithms and PHenomenology of Social
        Networks (GRAPHS)
         Started in March 2012

         http://www.darpa.mil/Our_Work/DSO/Programs/Graph-
           theoretic_Research_in_Algorithms_and_the_Phenomenology_of_Social_Network
           s_%28GRAPHS%29.aspx
Summary – The Big Picture
   Converging trends
       Rapid increase in the usage of the Internet/Web
           increased amount of interactions on line

           huge amount of socialization on line

       Increase in resolution and deployment of data collection „probes‟, e.g. GPS,
        cell phone/PDA, wireless enabled laptop, RFID tags, …
           increased ability to monitor and record interactions at a really fine
             granularity
       Dramatic increase in storage capacity and decrease in storage costs
           feasible to store all the data collected

       Fundamental advances in computational methods for data analytics
   Becoming possible to really understand individual and group
    behavior at a fine granularity
   Great opportunities for
       Basic R&D
       Applied R&D
       Entrepreneurship
   But, putting together the right team and partnerships is critical!
and last,
 but certainly not the least

- thank you for your invitation

Jaideep

  • 1.
    Social Analytics Mining Behaviors of a Connected World PAKDD School April 11, 2013 Sydney, Ausytralia Jaideep Srivastava University of Minnesota srivasta@cs.umn.edu 4/9/2013 University of Minnesota 1
  • 2.
    Course Outline • Module1 • Introduction to Social Analytics – applying data mining to social computing systems; examples of a number of social computing systems, e.g. FaceBook, MMO games, etc. • Module 2 • Computational online trust • Identifying key influencers • Information flow in networks • Module 3 • Analysis of clandestine networks • Katana - game analytics engine 4/9/2013 University of Minnesota 2
  • 3.
  • 4.
    Social Network Analysis O a iz tio a rg n a n l Sc l o ia A th p lo y n ro o g T e ry ho P y h lo y sco g C g itiv on e P rc p n S c -C g itiv e e tio o io o n e K o le g nw de N tw rk e o s N tw rk e o s Ra e lity Sc l o ia K o le g nw de N tw rk e o s N tw rk e o s A q a ta c c u in n e K o le g nw de E id m lo y p e io g (lin s k) (c n n o te t) S c lo y o io g  Social science networks have widespread application in various fields  Most of the analyses techniques have come from Sociology, Statistics and Mathematics  See (Wasserman and Faust, 1994) for a comprehensive introduction to social network analysis 12/02/06 IEEE ICDM 2006 4
  • 5.
    What have beenit‟s key scientific successes?  In classical social sciences numerous results  „Six degree of separation‟ [Milgram]  Popularized by the „Kevin Bacon game‟  „The strength of weak ties‟ [Granovetter]  „Online networks as social networks‟ [Wellman, Krackhardt]  „Dunbar Number‟  Various types of centrality measures  Etc.  In the Web era  „The Bow-Tie model of the Web‟ [Raghavan]  „Preferential attachment model‟ [Barabasi] (Yes and No)  „Powerlaw of degree distribution‟ [Lots of people] (NO!)  Etc.
  • 6.
    Application successes  Numerous in social sciences  Google – PageRank  LinkedIn – expanding your Cognitive Social Network  making you aware that „you‟re more connected and closer than you think you are‟  Expertise discovery in organizations  Knowledge experts, „authorities‟  Well-connected individuals, „hubs‟  Rapid-response teams in emergency management  Information flow in organizations  Twitter – real time information dissemination  Etc.
  • 7.
    Online (Multiplayer) Games High How social Low Low High How enagaging
  • 8.
    Player Behavior &Revenue Model  Blizzard (subscription)  Zynga (free2play)  World of Warcraft  Farmville, Fishville, Mafia  12 million subscribers Wars, etc.  Revenue model  180 million players  $15/month  Revenue model  Approx $3billion annual  Virtual goods revenue  $700 million in 2010  4 hours a day, 7 days a  0.5 hrs a day, 7 days a week week! Hard core gamers Everyone Less socially acceptable More socially acceptable Like Cocaine Like Caffeine
  • 9.
    Implications of this„addiction‟  3 billion hours a week are being spent playing online games  Jane McGonigal in “Reality is Broken”  Labor economics  What is the impact of so much labor being removed from the pool [Castranova]  Entertainment economics  If MMO players can get 100 hrs/month of entertainment by spending $25 or so, what will happen to other entertainment industries?  Psychological/Sociological  Is it an addiction – the prevailing view (Chinese government‟s „detox centers‟ for kids)  Are they fulfilling a deeper need that real world is not (McGonigal)  Societal  A trend far too important to not be taken seriously!
  • 10.
  • 11.
    Levis‟ – Exampleof Social Retail  Levis‟ leverages its brand to ensure customers provide their social network  Levis‟ can leverage predictive social analytics technology to understand the value of the customer‟s social network 11
  • 12.
    Opportunity, Innovation, Impact  Companies do not understand the social graph of their customers  It‟s not just about how they relate to their customers, but also about how customers relate to each other vs.  Understanding these relationships unlocks immense value  Innovation: Understanding the social network of customers  Key influencers, relationship strength, …  Impact: Deriving actionable insights from this understanding  Customer acquisition, retention, customer care, …  Social recommendation, influence-based marketing, identifying trend-setters, … Ninja Metrics confidential information. Copyright 2012 12
  • 13.
    Unlocking true valueby product, category, or store 0021 -$128.61 -$293.79 -$79.63 13
  • 14.
    True Value ofeach customer  True value = individual value + social value  Who really matters, and to what degree  Some empirical facts  31% activity due to socialization  23% more individual + 8% more social activity The individual‟s their social and their true lifetime value influence total 14
  • 15.
    Impact of NewInstrumentation on Science  1950s  Invention of the electron microscope fundamentally changed chemistry from „playing with colored liquids in a lab‟ to „truly understanding what‟s going on‟  1970s  Invention of gene sequencing fundamentally changed biology from a qualitative field to a quantitative field  1980s  Deployment of the Hubble (and other) Space telescopes has had fundamental impact on astronomy and astrophysics  2000s  Massive adoption is fundamentally changing social science research  Massively Multiplayer Online Games (MMOGs) and Virtual Worlds (VWs) are acting as „macroscopes of human behavior‟
  • 16.
    The Virtual World Observatory (VWO) Project • Four PIs, 30+ Post-docs, PhD and MS students, UGs, high-schoolers • Noshir Contractor, Northwestern: Networks • M. Scott Poole, Illinois Urbana-Champaign/NCSA: Groups • Jaideep Srivastava, Minnesota: Computer Science • Dmitri Williams, USC: Social Psychology • Collaborators • Castronova (Sociology, Indiana), Yee (Xerox PARC), Consalvo, Caplan (Economics, Delaware), Burt (Sociology, U of Chicago), Adamic (Info Sci, Michigan), … • Data and technology partners • Sony (EverQuest 2), Linden Labs (2nd Life), Bungie (Halo3), Kingsoft (Chevalier‟s Romance), others … • Cloudera Systems (Hadoop), Microsoft (SQL Server), Weka, …
  • 17.
    Overall Goals ofthe VWO Project Basic Science • behavior, socialization, … • novel, scalable algorithms • NSF Sticky Social Media tons of Analytics Government applications data • Social science • Games • team dynamics (Army) models • skills acq, leadership (Army) • Virtual Worlds • New algorithms • social influence & adolescent • Other social • Ultra-scalability health (CDC) • etc. apps Business applications • customer churn • game design • anti-social behavior • etc.
  • 18.
    Collaborators, Sponsors, Partners  Team  Financial Sponsors  Data Partners  Technology Partners
  • 19.
    Part II –Impact on Science
  • 20.
    Findings from aPlayer Survey
  • 21.
    Who is playing?  It is not just a bunch of kids  Average age is 31.16 (US population median is 35)  More players in their 30s than in their 20s.
  • 22.
    How much dothey play?  Mean is 25.86 hours/week  Compares to US mean of 31.5 for TV (Hu et al, 2001) • From prior experimental work, MMO play eats into entertainment TV and going out, not news • So much for kids being the ones with the free time.
  • 23.
    Gender Differences  More men players (78/22%)  Men played to compete , and women played to socialize  Men play more other games, but it was the women who were more satisfied EQ2 players  Women: 29.32 hours/week  Men: 25.03 hours/week  Likelihood of quitting: “no plans to quit”: women 48.66%, men 35.08%  Self reported play times  Women: 26.03 (3 hours less than actual)  Men: 24.10 (1 hour less than actual)  Boys and girls are socialized early on, and thus have clear role expectations for their behaviors and identities (Gender Role Theory in action!!)
  • 24.
  • 25.
    Inferring RW genderfrom VW data Goal What virtual world behaviors and characteristics predict real world gender? Data:  Survey Data n=7119  Survey Character Store  EQ2 Character Store Variables  Avatar Characteristics:  Gender, Race, Class, Experience, Guild Rank, Alignment, Archetypes  Game play Behaviors:  Total Deaths & Quests, PvP Kills & Deaths, Achievement Points, Number of Characters, Time played, Communication patterns 25
  • 26.
    Gender prediction results  Close to 95% prediction accuracy  Decision trees work rather well  Character Gender, Race and Class are significant predictors to real life gender.  Gender swapping is rarer, but systematically different by real gender  Players tend to choose:  character gender based on their real life gender  character races that are gendered: women play elves/men play barbarians  classes that are gendered: women play priests/men play fighters 26
  • 27.
    Gender swapping behavior GameCharacter Real Gender Gender Male Female Total Male 4065 82.6% 98 8.2% 4163 68.0% Female 855 17.4% 1104 91.8% 1959 32.0% Total 4920 100.0% 1202 100.0% 6122 100.0% Observation • Far more males gender swap than females • Why? • Men are more creative? • Women have less identity confusion? • Women get their „fill of gender swapping in real life‟  27
  • 28.
    Economics: A testof RW  VW mapping  Do players behave in virtual worlds as we expect them to in the actual world?  Economics is an obvious dimension to test  In the real world, perfect aggregate data are hard to get
  • 29.
    GDP and PriceLevel  GDP and price levels are robust but comparatively unstable GDP and Prices on Antonia Bayle 5,000,000 160 4,500,000 140 4,000,000 120 Prices (January = 100) 3,500,000 100 GDP (Gold) 3,000,000 2,500,000 80 2,000,000 60 1,500,000 40 1,000,000 20 500,000 0 0 January February March April May Nominal GDP Price Level (January = 100)
  • 30.
    Money Supply andPrice Change in Money Supply and Population on Antonia Bayle  The instability is 2500 4000 explicable through 3000 Change in Money (000 Gold) 2000 the Quantity Theory 2000 Change in Accounts 1000 of Money 1500 0 -1000  a rapid influx of 1000 -2000 money . . . 500 -3000 -4000 0 -5000 February March April May Change in Money Supply (000 Gold) Change in Active Accounts  . . . dramatically Price Level on Antonia Bayle boosted prices 50 40 Percent Change in Price Level 30  More evidence that 20 this behaves like a 10 real economy 0 -10 February March April May June -20 Price Level
  • 31.
    Networks in VirtualWorlds SONIC Advancing the Science of Networks in Communities
  • 32.
    Why do wecreate and sustain networks?  Theories of self-interest  Theories of contagion  Theories of social and  Theories of balance resource exchange  Theories of homophily  Theories of mutual interest  Theories of proximity and collective action  Theories of co-evolution Sources: Contractor, N. S., Wasserman, S. & Faust, K. (2006). Testing multi-theoretical multilevel hypotheses about organizational networks: An analytic framework and empirical example. Academy of Management Review. Monge, P. R. & Contractor, N. S. (2003). Theories of Communication Networks. New York: Oxford University Press. SONIC Advancing the Science of Networks in Communities
  • 33.
    “Structural signatures” ofSocial Theories A A B B F + F + - C - E C E D D Self interest Exchange Balance A A B F B F - + + C E C D E Novice D Expert Collective Action Homophily Contagion SONIC Advancing the Science of Networks in Communities
  • 34.
    Black: male Red: female Partnership Instant messaging SONIC Trade Mail Advancing the Science of Networks in Communities
  • 35.
    4 5 2 0 0 0 (1) 0 0 0 (2) 0 1 0 1 3 0 1 2 0 0 1 0 (n) 0 0 0 Social Networks (1) (2) (n) as network structure frequency Cluster Structure vectors in a bag-of-words model Vectors using Text clustering methods Social theory + IR Cluster means provide Attribute values for based network modes of network each cluster can be analysis structure configurations used to discover Making up all the trends between network social networks structures and attributes
  • 36.
    Results – normalizednetwork structure vector means for all clusters
  • 37.
    Results  Clusters 1 and 4 are similar  Groups kill fewer monsters  Group members in cluster 4 do not communicate much  Group members in cluster 1 generally limit their communication to just one other person in the group  Most people belong to these two clusters  Consistent with previous research - users in virtual environments are less likely to interact with strangers [N. Ducheneaut, N. Yee, E. Nickell and R. Moore, “Alone Together?” Exploring the social dynamics of massively multiplayer online games, Proceedings CHI06, ACM Press, New York, 407-416.]  Cluster 5 groups have many 1-edge and 2-out stars  Most of the communication is one way possibly indicating presence of central people  Maximum number of monsters killed out of all clusters  Performance of the groups is very good  Minimal communication  It is possible that cluster 5 consists of groups more focused on playing and performing well in the game and less on socializing
  • 38.
    Some Open Questions  How similar/dissimilar are online social networks from real- world social networks?  Is online socialization  Only a sustainability activity of real-world networks?  Causing new social networks to be formed?  Is fundamentally a different type of networking activity?  A fundamental tenet of socialization has been “geography/proximity drives socialization”  How is this being impacted by online socialization?
  • 39.
    TeamSkill: Modeling Team Chemistryin Online Multi- Player Games
  • 40.
    Description and motivation  The goal of this work is to improve skill assessment approaches used in multiplayer games, especially team-based games  Xbox Live, PSN, Steam, and Battle.net have between 149-167 million total users, collectively (and that number is growing)  Many of the most popular games are team-based: Halo, Call of Duty, TF2, CS, etc  Why?  Better skill assessment = fairer games = less player attrition  More accurate rankings of players/teams  Most previous work has focused on individuals, not teams  It‟s a hard problem  Online setting: Updates to a player‟s skill distribution must be done after each game is played  Applicability: Any generalized assessment technique cannot include game-specific data  Our datasets are from the games of professional Halo 3 players 40
  • 41.
    What is Halo?  Halo is one of the most well-known “first-person shooter” video game series in the world  Online/LAN-based multiplayer is its most significant component  650,000 to 850,000 unique users per day at its peak 41
  • 42.
    Major League Gaming  Our work focuses on professional Halo 3 players  Online scrimmage data, as well as complete tournament data, is readily available from bungie.net and mlgpro.com  Players are highly-skilled individually – allows us to better focus on group-level What is Major League Gaming (MLG)? performance characteristics • A professional league for competitive  Players change teams gaming, including Halo 3 from ‟08-‟10 • Best players in the world compete in this regularly, helps isolate league impact of particular players • Last tournament watched by over on overall team performance 1,000,000 people online  Unexplored boundary case • Our datasets are comprised of games played by these players for skill assessment 42
  • 43.
    Skill assessment  An old problem (with different applications in different contexts)  Paired comparison estimation  Foundational work by Thurstone (1927) and Bradley-Terry (1952)  Elo (1959), popularized in chess ranking (FIDE, USCF)  Glicko (1993) – player-level ratings volatility incorporated (σ2), addition of rating periods  TrueSkill (2006) - factor-graph based approach used in Microsoft‟s Xbox Live gaming service  Used to match players/teams up with each other online  Our work focuses on Elo, Glicko, and TrueSkill 43
  • 44.
    Elo (1959)  Arpad Elo, Dept of Physics, Marquette University  Was a master chess player in USCF  Proposed the Elo Rating System  Replaced earlier systems of competitive rewards (i.e., tournaments) in USCF/FIDE  Simple to implement: assumes player skill is normally- distributed with constant variance β2  Still widely-used today 44
  • 45.
    Glicko (1993)  Mark E. Glickman, Department of Health Policy and Management , Boston University  Proposed the Glicko Rating System  Addresses rating reliability issue through the use of rating periods and player-specific variances  Elo is a special case of Glicko  Iterative approach for approximating the marginal posterior distribution of a player‟s skill conditional on other players‟ priors  More computationally tractable for large datasets 45
  • 46.
    TrueSkill (2006)  Ralf Herbrich and Thore Graepel at Microsoft Research, Cambridge, UK  Used for automated ranking/matchmaking on Xbox Live  Uses factor graphs to model multi-player, multi-team environments  Converges quickly (~5 games for a player)  Large-scale deployment  30 million Xbox Live members  150+ games use TrueSkill for ranking & matchmaking 46
  • 47.
    Issues with currentapproaches 1 2 3 4 + 1234  Basic idea  Given: skill ratings of each team member, i.e., si ~ N(μi, σi2)  Sum across all team members  Not intuitive: Team chemistry is a well-known concept in team-based competition [Martens 1987, Yukelson 1997] and it is not captured in any of these models  Can think of it as the overall dynamics of a team resulting from leadership, confidence, relationships, and mutual trust  Independence assumption not realistic in teams, especially at high levels of play 47
  • 48.
    More information isavailable, however… 1 2 3 4 12 13 14 23 24 34 123 124 134 234 1234  Observation: We have more information than just the history of individual players – we also know the histories of groups of players  Player 1‟s history ∩ player 2‟s history  history of {1, 2}  Existing approaches only make use of top row  Idea: Estimate the skills of subgroups of players on a team, combine in some way, and use to produce better estimate of a team‟s skill 48
  • 49.
    Reframing the assessment problem k=1 1 2 3 4 k=2 12 13 14 23 24 34 k=3 123 124 134 234 k=4 1234  Alterations to existing approaches necessary (i.e., Elo/Glicko/TrueSkill)  Player-level skill representation  generalized subgroup skill representation  For each game and for each k <= size of the team (K), treat each group as you would an individual player and update skill accordingly  Hashing of skill variable matrix according to unordered subgroup membership  Treat Elo/Glicko/TrueSkill as „base learners‟, a la boosting  Each rating says something about the skill of that particular subgroup  …but how do we aggregate these ratings to estimate the skill of a team? 49
  • 50.
    TeamSkill  Four different aggregation approaches:  TeamSkill-K  TeamSkill-AllK  TeamSkill-AllK-EV  TeamSkill-AllK-LS 50
  • 51.
    Aggregation issues black = history available red = no history available time: t=1 t = 100 t = 200 After 1 game New player – 5 - who‟s New player – 6 – who‟s never played with 1, 2, or 3 played with 3 or 5 (not both)  Real world case: assume that players can leave/join the 4-player team and look at the timeline  The question at each point in time, t: how best to combine the available group ratings to produce a team rating?  The problem: the feature space is expanding and contracting over time. 51
  • 52.
    Data set overview  Collected over the course of 2009  7,590 games (2,076 from tournaments and 5,514 from Xbox Live scrimmages)  448 players on 140 different professional and semi- professional teams  Games took place during January 2008 through January 2010  Websites pertaining to this data  http://stats.halofit.org - Player/team statistics  http://halofit.org – Datasets and related information “Friend or foe” social network for all tournaments in 2008 and 2009 52
  • 53.
    Evaluation  The TeamSkill approaches were evaluated by predicting the outcomes of games occurring prior to 10 MLG tournaments and comparing their accuracy to unaltered versions (k = 1) of their base learner rating systems - Elo, Glicko, and TrueSkill (for TeamSkill-K, all possible choices of k for teams of 4, 1 ≤ k ≤ 4, were used)  For each tournament, we evaluated each rating approach using:  3 types of training data sets - games consisting only of previous tournament data, games from online scrimmages only, and games of both types.  3 periods of game history - all data except for the data between the test tournament and the one preceding it (“long”), all data between the test tournament and the one preceding it (“recent”), and all data before the test tournament (“complete”).  We will only show “complete” as results for “long” and “recent” mirrored those for “complete”  2 types of games - the full dataset and those games considered “close” (i.e., prior probability of one team winning close to 50%). 53
  • 54.
    Results – allgames Prediction accuracy for both tournament and scrimmage/custom games using complete history Prediction accuracy for tournament games using complete history Prediction accuracy for scrimmage/custom games using complete history 54
  • 55.
    Results – “close”games Prediction accuracy for both tournament and scrimmage/custom games using complete history Prediction accuracy for tournament games using complete history Prediction accuracy for scrimmage/custom games using complete history 55
  • 56.
    Conclusions  Modified several existing assessment approaches to operate on generalized player subgroup entities (instead of just individual players)  Introduced four aggregation methods for combining player subgroup information to produce final forecast  Shown evidence that close games are decided on the basis of team chemistry, consistent with sports psychology research 56
  • 57.
    Part III: Impacton Business
  • 58.
  • 59.
    Time Equals Stickiness 6.00 5.00 6 Ratio of Quitters to Stayers 4.00 3.00 2.00 1.00 0.00 1 3 5 7 9 11 13 15 17 19 21 23 25 27 29 31 33 35 37 39 41 43 45 47 49 51 53 55 57 59 61 63 65 67 69 Character Level  There are less quitters as the levels go up, and focus should be on the first 20 levels.
  • 60.
    Solo vs. SocialPlayers 6.00 5.00 Ratio of Quitters to Stayers 4.00 3.00 Solo Social 2.00 1.00 0.00 1 3 5 7 9 11 13 15 17 19 21 23 25 27 29 31 33 35 37 39 41 43 45 47 49 51 53 55 57 59 61 63 65 67 69 Character Level  Isolated players are 3.5x more likely to quit (B = 1.26, p<.001). Focus design on facilitating social interaction.
  • 61.
    Problem Statement &Approach  Objective: At time t, for player p, given a window w, compute the probability, P(p,t,w), of p churning within the interval (t,t+w), and the confidence in this probability.  Approach: Estimate P(p,t,w) and its confidence using  Data about p‟s activity and socialization behavior  Statistics and machine learning (linear, non-linear, ensemble, etc.)  Use of socio-psychological theories of player motivation  Novel synthesis of data driven (DD) and theory driven (TD) approaches 61
  • 62.
    Player Motivation Theories Richard Bartle, 1996 Nick Yee, 2005 62
  • 63.
    Mapping MMO Behaviorsto Theory  Example MMORPG  Persistent virtual world  Players have avatars  Quest-driven  Participation in groups, guilds  Dataset (game logs)  Time period: FEB-JUN, 2006  No. of accounts: ~16000  Churner definition: 2 months of inactivity 63
  • 64.
    Synthesis of TDand DD approach 64
  • 65.
    Model Evaluation (ConfusionMatrix)  Model performance for different features (10 fold cross-validation) Features Tree size Precision (%) Recall (%) F-measure (nodes) (%) Pure DD 485 69.3 84 76 TD: 59 67.1 76.2 71.3 Achievement+Social TD: Achievement 37 67.2 73.2 70.1 TD: Socialization 7 64.3 55.4 59.5  Observations  Decision tree learning works quite well  DD model very accurate (76%); difficult to interpret (485 nodes)  TD model interpretable (7 – 59 nodes), but less accurate  Achievement-orientation more important than socialization 65
  • 66.
    Model Evaluation (LiftChart)  Observations  TD model does better in predicting top quintile of churners  DD model performs better in the 40%-70% range 66
  • 67.
    Ensemble Approach –Model Evaluation  Ensemble approach  A „committee of classifiers‟ votes on the result  Heterogeneity helps No. of Precision Recall F-measure clusters 5 66.23 91.94 76.99 10 66.49 91.08 76.87 15 67.58 89.80 77.12 20 67.6 90.13 77.25  Improvements over single model  Best recall value shows 7.94% improvement , i.e. model can identify larger proportion of potential churners)  Best F-measure shows 1.25% improvement 67
  • 68.
    Conclusions  Comparison of theory-driven and data-driven model in terms of prediction accuracy and model interpretability  Achievement-orientation is more important than socialization-orientation in identifying potential churners  Ensemble model can identify larger proportion of potential churners as compared to single global model 68
  • 69.
    Course Outline • Module1 • Introduction to Social Analytics – applying data mining to social computing systems; examples of a number of social computing systems, e.g. FaceBook, MMO games, etc. • Module 2 • Computational online trust • Identifying key influencers • Module 3 • Analysis of clandestine networks • Katana - game analytics engine 4/9/2013 University of Minnesota 69
  • 70.
    Computational Trust in Multiplayer Online Games
  • 71.
    “If you wantto go fast walk alone, if you want to go far then walk with a group.” - Proverb from Ghana
  • 72.
    Virtual Worlds &Massive Online Games  Massively Multiplayer Online Role Playing Games (MMORPGs/MMOs)  Simulated Environments like SecondLife  Millions of people can interact with one another is shared virtual environment  People can engage in a large number of activities with one another and with the environment  Many of the observed behaviors have offline analogs 72 72
  • 73.
    Big Picture Questions  How is trust expressed differently in different social contexts?  Cooperative (PvE), Adversarial (PvP), …  How is trust expressed in different types of social networks?  Housing, Mentoring, Trade, Group, …  What are the characteristics of trust and related networks in MMOs?  Similarities and differences with social networks in other domains e.g., citation networks, co-authorship networks  What role can features derived from the trust network play in prediction tasks e.g., link prediction (formation, breakage, change), trust propensity, success prediction 73
  • 74.
    Computational Trust inMultiplayer Online Games Department of Computer Science, University of Minnesota Muhammad Aurangzeb Ahmad Trust as a Multi-Level Network Phenomenon Trust as a multi-modal multi-level Network formulations of Traditional Incorporating social science theories Generative Network Models for Trust based network phenomenon Trust related concepts in trust related prediction task social interactions
  • 75.
    Computational Trust inMultiplayer Online Games Department of Computer Science, University of Minnesota Muhammad Aurangzeb Ahmad Trust as a Multi-Level Network Phenomenon Trust as a multi-modal multi-level Network formulations of Traditional Incorporating social science theories Generative Network Models for Trust based network phenomenon Trust related concepts in trust related prediction task social interactions Social Characteristics of Trust Effect of Social Environments on Trust Trust and Homophily Trust and Clandestine Behavior Trust and Mentoring Trust and Trade Most types of homophily do not Different Social environments result in carry over to the MMO domain No Honor Amongst Thieves differences in network signatures Trust based and other social networks in MMOs exhibit anomalous network characteristics
  • 76.
    Computational Trust inMultiplayer Online Games Department of Computer Science, University of Minnesota Muhammad Aurangzeb Ahmad Trust as a Multi-Level Network Phenomenon Trust as a multi-modal multi-level Network formulations of Traditional Incorporating social science theories Generative Network Models for Trust based network phenomenon Trust related concepts in trust related prediction task social interactions Social Characteristics of Trust Effect of Social Environments on Trust Trust and Homophily Trust and Clandestine Behavior Trust and Mentoring Trust and Trade Most types of homophily do not Different Social environments result in carry over to the MMO domain No Honor Amongst Thieves differences in network signatures Trust based and other social networks in MMOs exhibit anomalous network characteristics Trust and Prediction Trust Prediction Family of Problems Item Recommendation Success Prediction (Social Capital) Predict Formation, Change, Breakage of Trust with in the same network and across social networks. Predict trust propensity. Effects of social environments on trust Effect of network structure on trust
  • 77.
    Computational Trust inMultiplayer Online Games Department of Computer Science, University of Minnesota Muhammad Aurangzeb Ahmad Trust as a Multi-Level Network Phenomenon Trust as a multi-modal multi-level Network formulations of Traditional Incorporating social science theories Generative Network Models for Trust based network phenomenon Trust related concepts in trust related prediction task social interactions Social Characteristics of Trust Effect of Social Environments on Trust Trust and Homophily Trust and Clandestine Behavior Trust and Mentoring Trust and Trade Most types of homophily do not Different Social environments result in carry over to the MMO domain No Honor Amongst Thieves differences in network signatures Trust based and other social networks in MMOs exhibit anomalous network characteristics Trust and Prediction Trust Prediction Family of Problems Item Recommendation Success Prediction (Social Capital) Predict Formation, Change, Breakage of Trust with in the same network and across Social networks. Predict trust propensity. Effects of social environments on trust Effect of network structure on trust
  • 78.
    Computational Trust inMultiplayer Online Games Department of Computer Science, University of Minnesota Muhammad Aurangzeb Ahmad Computational Social Science Semantics Structure Trust and Homophily Characteristics of Trust Networks Most types of homophily do not Trust based and other social networks in MMOs carry over to the MMO domain exhibit anomalous network characteristics Algorithmic Trust Prediction Family of Problems Predict Formation, Change, Breakage of Trust with in the same network and across social networks. Predict trust propensity. Applications Detection of Clandestine Actors No Honor Amongst Thieves Gold Farmer Detection
  • 79.
    Computational Trust inMultiplayer Online Games Department of Computer Science, University of Minnesota Muhammad Aurangzeb Ahmad Computational Social Science Semantics Structure Trust and Homophily Characteristics of Trust Networks Most types of homophily do not Trust based and other social networks in MMOs carry over to the MMO domain exhibit anomalous network characteristics Algorithmic Trust Prediction Family of Problems Predict Formation, Change, Breakage of Trust with in the same network and across social networks. Predict trust propensity. Applications Detection of Clandestine Actors No Honor Amongst Thieves Gold Farmer Detection
  • 80.
    Housing-Trust in EQ2 • Access permissions to in-game house as trust relationships • None: Cannot enter house. • Visitor: Can enter the house and can interact with objects in the house. • Friend: Visitor + move items • Trustee: Friend + remove items • Houses can contain also items which allow sales to other characters without exchanging on the market 80
  • 81.
    Homophily and Trust  Homophily: Birds of a feather flock together  There is no one form of homophily and homophily in general is described in multiple ways: Status vs. Value Homophily  Each of these homophiles are in turn defined in multiple ways themselves  Previous literature instantiates homophily in MMOs in terms of player characteristics and behavior in the game 81
  • 82.
    Network Models, Homophilyand Trust Networks in MMOs  RQ1: Does homophily in MMOs operate in ways similar to homophily in the offline world?  RQ2: How do we map characteristics that define homophily in the offline world to online settings? 82
  • 83.
    Mapping Homophily inMMOs  In general, studies of homophily in MMOs assume only one type of homophily and generalize based on that type  Even in the offline world homophily is of different types  Hence the necessity of Mapping Homophily which we address here  Mapping and Proteus Effect 83
  • 84.
    Trust and Homophilyin MMOs Homophily Type Hypothesis Observation H Gender Homophily Players trust other players who ? 1 are of the same gender H Age Homophily Players trust other players who ? 2 are of the same age cohorts H Class Homophily Players trust other players who ? 3 are of the same class H Race Homophily Players trust other players who ? 4 are of the same race H Guild Homophily Players trust other players who ? 5 belong to the same guild H Level Homophily Players trust other players who ? 6 are at a similar level H Challenge Players trust other players who ? 7 Homophily like similar types of challenges 84
  • 85.
    Key Observations H1: Playerstrust other players who are of the same gender? In general players trust other players who are of the same gender H2: Players trust other players who are of similar age? The stronger the type of trust the lesser is the age difference between the people specifying trust H3: Players trust other players who are of the same class? Class does not seem to effect the choice of trusting others 85
  • 86.
    Key Observations H4: Playerstrust other players who are of the same race? Race does not seem to effect the choice of trusting others H5: Players trust other players who are of the same guild? In general, the stronger the type of trust, the greater is the percentage of the people who trust people in their own guilds H6: Players trust other players more who are level at a similar rate? Leveling at the same rate does not seem to greatly effect trust amongst players 86
  • 87.
    Key Observations H7: Playerstrust other players who are of the same level? • Level difference seems to have some effect on trust • For Trustee (strongest) and the Visitor (weakest) form of trust, the lower level players are more likely to trust players who are at a higher level Summary: • Homophily is observed for a subset of types in MMOs as compared to what it is observed for in the offline world • The types of homophily which are not observed in MMOs are the ones which are greatly effected by game mechanics 87
  • 88.
    Computational Trust inMultiplayer Online Games Department of Computer Science, University of Minnesota Muhammad Aurangzeb Ahmad Computational Social Science Semantics Structure Trust and Homophily Characteristics of Trust Networks Most types of homophily do not Trust based and other social networks in MMOs carry over to the MMO domain exhibit anomalous network characteristics Algorithmic Trust Prediction Family of Problems Predict Formation, Change, Breakage of Trust with in the same network and across social networks. Predict trust propensity. Applications Detection of Clandestine Actors No Honor Amongst Thieves Gold Farmer Detection
  • 89.
    Social Networks: General Observations  There is an extensive literature on characteristics of social networks (Leskovec PAKDD 2005, Leskovec ICDM 2005, McGlohon ICDM 2008, McGlohon KDD 2008)  The network exhibits monotonically shrinking diameter over time (Leskovec PAKDD 2005, Leskovec ICDM 2005, McGlohon ICDM 2008)  At a certain point in time called the Gelling Point many smaller connected connect together and become part of the largest connected component (Leskovec PAKDD 2005, Leskovec ICDM 2005, McGlohon ICDM 2008)  The largest connected component (LCC) comprises of the majority of the nodes in the network (>= 80%) (McGlohon ICDM 2008, McGlohon KDD 2008) 89
  • 90.
    Social Networks: General Observations  The size of the second and the third largest connected components remain constant (more or less) even though the identity of these components change over time (Leskovec PAKDD 2005, Leskovec ICDM 2005, McGlohon ICDM 2008, McGlohon KDD 2008)  Network isolates are few in number (<5%) (Leskovec PAKDD 2005, Leskovec ICDM 2005)  The number of connected components decreases over time (Leskovec PAKDD 2005, Leskovec ICDM 2005, McGlohon ICDM 2008)  Relatively fast growth of LCC close to the gelling point (Leskovec PAKDD 2005, Leskovec ICDM 2005) 90
  • 91.
    Trust Networks inMMOs  Data from 4 servers is available. Results from one server (Player vs. Environment, „guk‟) are shown  The network consists of 15,237 nodes, 30,686 edges and 1,476 connected components  Dataset spans from January 2006 to August 2006  Average node degree of 4.03. The size of the three largest connected components are as follows: 9039, 51 and 49. The largest connected component accounts for 59% of all the nodes in the network The Trust Network on „guk‟ on August 31, 2006 91
  • 92.
    Key Observations  Observation 1: Preferential Attachment: The rich get richer but not too rich Explanation: Social bandwidth is limited, Dunbar Number  Observation 2: The growth of the LCC is retarded after the gelling point Explanation: The trust network has a relatively low growth rate as compared to the other networks  Observation 3: Non-monotonic change in the diameter of the largest connected component Explanation: Players have different levels of activity at various points in time and can also “drop out” of the network if they churn from the game 92
  • 93.
    Key Observations  Observation 4: A large number of isolate components are observed (> 1000) Explanation: People join in groups and spend all the time playing with one another instead of interacting with people from the outside  Observation 5: The number of isolate components increases monotonically over time Explanation: (Same as observation 4)  Observation 6: Nodes in the non-LCC constitute a significant portion of the network (41%, 8 months after gelling point) Explanation: (Same as observation 4) 93
  • 94.
    Generative Models ofTrust Networks  Time bound Preferential Attachment: The rich get rich but not so much after a certain point in time. Edge formation is bound by time  Presence of Auxiliary Components: Isolate components are added to the network at an almost constant rate over time 94
  • 95.
    Generative Models ofTrust Networks (ii)  Non-Monotonic Decrease in the Diameter: Nodes become inert after a certain point in time. Sample the lifetime of nodes from a normal distribution  Homophily in Edge formation: Probability of edge formation dependent upon node degree as well as agreement (similarity) in node characteristics 95
  • 96.
    Results Diameter: Non-Monotonically Changing Diameter % LCC as being relatively small 96
  • 97.
    Results Number of Connected Components Network growth and the gelling point 97
  • 98.
    Conclusion: Trust Networks  Trust Networks in MMOs exhibit many properties which are not exhibited by other social networks in most other domains  Proposed a model based on observations and domain knowledge  Models of social networks should incorporate the peculiarities which are observed in MMOs in general  Generalization? Similar observations have been made for mentoring networks but not for PvP, Trade and Chat Networks 98
  • 99.
    Computational Trust inMultiplayer Online Games Department of Computer Science, University of Minnesota Muhammad Aurangzeb Ahmad Computational Social Science Semantics Structure Trust and Homophily Characteristics of Trust Networks Most types of homophily do not Trust based and other social networks in MMOs carry over to the MMO domain exhibit anomalous network characteristics Algorithmic Trust Prediction Family of Problems Predict Formation, Change, Breakage of Trust with in the same network and across social networks. Predict trust propensity. Applications Detection of Clandestine Actors No Honor Amongst Thieves Gold Farmer Detection
  • 100.
    Trust Prediction Familyof Problems Trust Prediction: Given a trust network G predict which nodes are going to trust one another in the future Prediction Across Networks: Given a set of actors who participate in multiple types of interactions using features from one network predict the existence of links in the other network 100
  • 101.
    Trust Prediction Familyof Problems  Machine Learning/Classification approach to the problem of trust prediction  Introduced a set of new problems (inter-network link prediction, trust propensity prediction)  Proposed an algorithm for link prediction which used domain knowledge from social science theories  New Contributions:  Does the social context (adversarial vs. cooperative) effect prediction results?  If so then what is the implication for generalization? 101
  • 102.
    Prediction Task  Trust Prediction as a classification problem  60,000 examples for each prediction task  10 Fold Cross-validation  Data from Guk (Cooperative) and Nagafen (Adversarial) Servers  Six Standard Classifiers for Comparison: J48, JRip, AdaBoost, Bayes Network, Naive Bayes and k-nearest neighbor Positive Example: Negative Example: Training Period Test Period Training Period Test Period 102
  • 103.
    Prediction: Group Network  In general good prediction results are obtained for the combat network (in either direction), except one case  The grouping network is a union of all grouping instances and thus it is extremely dense (1,796,438 edges, 31,900 nodes) 103
  • 104.
    Prediction: Group Network  Grouping is not a good predictor of mentoring but the vice versa is correct  Mentoring is (more often than not is accompanied by grouping) but grouping can be in a variety of contexts e.g., raids, quests, dungeon instances, … 104
  • 105.
    Prediction: Combat Network  In general good prediction results are obtained for the combat network (in either direction)  In general if players have played against another person then they have friends in other networks who have done the same or something similar 105
  • 106.
    Comparison across Social Environments  T: Trust; M: Mentoring; B: Trade  In general the results are similar for both the environments with some exceptions  Mentoring-Mentoring: In the adversarial environment a more cliqueish behavior is observed i.e., friends of friends are likely to mentor one another in the future which is not the case for the cooperative environment 106
  • 107.
    Comparison across Social Environments  Mentoring-Related Tasks: In general mentoring is much more prevalent in the adversarial environment as compared to the cooperative environment (~3 times) and is much more intense. Overlap between mentoring and trade is thus more likely 107
  • 108.
    Computational Trust inMultiplayer Online Games Department of Computer Science, University of Minnesota Muhammad Aurangzeb Ahmad Computational Social Science Semantics Structure Trust and Homophily Characteristics of Trust Networks Most types of homophily do not Trust based and other social networks in MMOs carry over to the MMO domain exhibit anomalous network characteristics Algorithmic Trust Prediction Family of Problems Predict Formation, Change, Breakage of Trust with in the same network and across social networks. Predict trust propensity. Applications Detection of Clandestine Actors No Honor Amongst Thieves Gold Farmer Detection
  • 109.
    Trust Amongst ClandestineActors “A plague upon‟t when thieves cannot be true one to another!” – Sir Falstaff, Henry IV, Part 1, II.ii Do gold farmers trust each other? 109
  • 110.
    Hypergraphs to RepresentTripartite Graphs • Accounts can have several characters • Houses can be accessed by several characters • Projecting to one- or two-model data obscures crucial information about embededdness and paths • Figure 2a: Can ca31 access the same house as ca11? • Figure 2b: Are characters all owned by same account? 110
  • 111.
    Hypergraphs: Key Concepts • Hyperedge: An edge between three or more nodes in a graph. We use three types of nodes: Character, account and house • Node Degree: The number of hyperedges which are connected to a node NDh1 = 3 • Edge Degree: The number of hyperedges that an edge participates in EDa1-h1 = 2 111
  • 112.
    Network Characteristics • Longtail distributions are observed for the various degree distributions • The mapping from character-house to an account is always unique • Players who are connected to a large number of houses are highly active players otherwise as well 112
  • 113.
    Characteristics of HypergraphProjection Networks • Account Projection: Majority of the gold farmer nodes are isolates (79%). Affiliates well-connected (8.89) vs. non-affiliates (3.47) • Character Projection: Majority of the gold farmer nodes are isolates (84%). Affiliates well-connected (10.42) vs. non-affiliates (3.23) • House Projection: 521 gold farmer houses. Most are isolates (not shown) but others are part of complex structures. Densely connected network with gold farmers (7.56) and affiliates (84.02) The Housing Projection Network 113
  • 114.
    Key Observations • Gold farmers grant trust ties less frequently than either affiliates or general players • Gold farmers grant and receive fewer housing permissions (1.82) than their affiliates (4.03) or general player population (2.73) Total degree In degree Out degree <n> < nGF > < nAff > <n> < nGF > < nAff > <n> < nGF > < nAff > Farmers 1.82 0.29 1.82 0.89 0.29 0.89 1.07 0.29 1.07 Affiliates 4.03 1.28 0.70 1.55 0.75 0.70 2.88 0.63 0.70 Non-Affiliates 2.73 - 7.77 1.57 - 5.98 1.56 - 2.34 114
  • 115.
    Key Observations • No honor among thieves • Gold farmers also have very low tendency to grant other gold farmers permission (0.29) • Affiliates also unlikely to trust other affiliates (0.70) Total degree In degree Out degree <n> < nGF > < nAff > <n> < nGF > < nAff > <n> < nGF > < nAff > Farmers 1.82 0.29 1.82 0.89 0.29 0.89 1.07 0.29 1.07 Affiliates 4.03 1.28 0.70 1.55 0.75 0.70 2.88 0.63 0.70 Non-Affiliates 2.73 - 7.77 1.57 - 5.98 1.56 - 2.34 115
  • 116.
    Key Observations • Affiliates are brokers: • Farmers trust affiliates more (1.82) than other farmers (0.29) • Affiliates trust farmers more (1.28) than other affiliates (0.70) • Non-affiliates have a greater tendency to grant permissions to affiliates (7.77) than in general (2.73) Total degree In degree Out degree <n> < nGF > < nAff > <n> < nGF > < nAff > <n> < nGF > < nAff > Farmers 1.82 0.29 1.82 0.89 0.29 0.89 1.07 0.29 1.07 Affiliates 4.03 1.28 0.70 1.55 0.75 0.70 2.88 0.63 0.70 Non-Affiliates 2.73 - 7.77 1.57 - 5.98 1.56 - 2.34 116
  • 117.
    Frequent Itemset Miningfor Frequent Hyper-subgraphs Support of a Hyper-subgraph: Given a sub-hypergraph of size k, subP is the pattern of interest containing the label P, shP is a pattern of the same size as subP and contains the label P, the support is defined as follows: Support of pattern also containing a gold farmer (red) = 5/8 117
  • 118.
    Frequent Itemset Miningfor Frequent Hyper-subgraphs Confidence of a Hyper-Subgraph: Given a sub-hypergraph of size k, subP is the pattern of interest containing the label P, subG is a pattern which is structurally equivalent but which does not contain the label P, the confidence is defined as follows: Confidence of pattern and containing a gold farmer = 5/7 118
  • 119.
    Frequent Patterns ofGFs • Very low (s ≤ 0.1) support and confidence for almost all (except 8) frequent patterns with gold farmers • Remaining 8 patterns can be used for discrimination between gold farmers and non-gold farmers in a subset of the instances • Gold farmers & affiliates are more connected: A 3rd of more complex patterns are associated with affiliates (15/44) 119
  • 120.
    Application: Gold FarmerDetection (Behavioral) Classification using in-game behavioral and demographic features Each model corresponds to a grouping of different features 120
  • 121.
    Application: Gold FarmerDetection (Label Propagation) • Problem: Not all people who are labeled as normal players are such. Some of them are gold farmers but have not been identified as such • Analogue: If someone is socializing almost exclusively with criminals then he may be a criminal Classifier Metric Initial Dataset Label Prop Change In Performance Bayes Net Precision 0.17 0.189 0.019 Recall 0.834 0.819 -0.015 F-Score 0.282 0.307 0.025 J48 Precision 0.494 0.62 0.126 Recall 0.189 0.337 0.148 F-Score 0.273 0.437 0.164 J Rip Precision 0.495 0.537 0.042 Recall 0.462 0.462 0 F-Score 0.478 0.497 0.019 KNN Precision 0.436 0.46 0.024 Recall 0.396 0.428 0.032 F-Score 0.415 0.443 0.028 Logistic Regression Precision 0.455 0.534 0.079 Recall 0.189 0.271 0.082 F-Score 0.267 0.36 0.093 Naïve Bayes Precision 0.146 0.142 -0.004 Recall 0.538 0.502 -0.036 F-Score 0.23 0.221 -0.009 Adaboost w/ DT Precision 0.405 0.471 0.066 Recall 0.105 0.08 -0.025 F-Score 0.167 0.137 -0.03 121
  • 122.
    Gold Farmer Detection  Model 1 (Player Attribute Based Features): These features are based on the attributes of the player‟s character in the game e.g., character race, character gender, distribution of gaming activities etc.  Model 2 (Item Based Features): These are the features which are derived from items bought and sold from the consignment network. These features are based on the frequency of the frequent items sold or bought by gold farmers.  Model 3 (Player Attribute & Item Based Features): All the attributes from the previous two models.  Model 4 (Item Network Based Features): Features which are derived from the item network in a manner analogous to Model 2.  Model 5 (Player Attribute & Item-Network Based Features): A combination of features from Model 1 and Model 4.  Model 6 (Item Network & Item-Network Based Features): A combination of features from Model 2 and Model 4.  Model 7 (Player Attribute, Item & Item-Network Based Features): Union of all the features described above. 122
  • 123.
    Application: Structural SignaturesApproach Applied to Trade Networks Not sufficient data is present to use the structural signature approach to catch gold farmers Alternative: Application of the same approach in other networks: Trade Network A set of standard machine learning models are used: Naive Bayes, Bayes Net, Logistic Regression, KNN, J48, JRip, AdaBoost and SMO 123
  • 124.
    Conclusion: Gold FarmerBehavior Analysis and Prediction Representation Related Issues addressed with respect to trust in MMOs Application of frequent pattern mining to discover distinct trust patterns associated with gold farmers No honor between thieves: Gold farmers tend not to trust other gold farmers 124
  • 125.
    Computational Trust inMultiplayer Online Games Department of Computer Science, University of Minnesota Muhammad Aurangzeb Ahmad Trust as a Multi-Level Network Phenomenon Trust as a multi-modal multi-level Network formulations of Traditional Incorporating social science theories Generative Network Models for Trust based network phenomenon Trust related concepts in trust related prediction task social interactions Social Characteristics of Trust Effect of Social Environments on Trust Trust and Homophily Trust and Clandestine Behavior Trust and Mentoring Trust and Trade Most types of homophily do not Different Social environments result in carry over to the MMO domain No Honor Amongst Thieves differences in network signatures Trust based and other social networks in MMOs exhibit anomalous network characteristics Trust and Prediction Trust Prediction Family of Problems Item Recommendation Success Prediction (Social Capital) Predict Formation, Change, Breakage of Trust with in the same network and across social networks. Predict trust propensity. Effects of social environments on trust Effect of network structure on trust
  • 126.
    Computational Trust inMultiplayer Online Games Department of Computer Science, University of Minnesota Muhammad Aurangzeb Ahmad Computational Social Science Semantics Structure Trust and Homophily Characteristics of Trust Networks Most types of homophily do not Trust based and other social networks in MMOs carry over to the MMO domain exhibit anomalous network characteristics Predictive Analysis Trust Prediction Family of Problems Predict Formation, Change, Breakage of Trust with in the same network and across social networks. Predict trust propensity. Applications Detection of Clandestine Actors No Honor Amongst Thieves Gold Farmer Detection
  • 127.
    Conclusion  Availability of data allows one to analyze phenomenon where it was not possible to do so in the past (Trust in MMOs in the current case)  Explored a series of big-picture questions pertaining to trust in MMOs  Similar affordances and contexts (with respect to the offline world) lead to similar outcomes in the online world  Explored and expanded the scope of trust related prediction tasks  Analysis is required on more datasets for generalizability 127
  • 128.
    Acknowledgement  DMR Lab  Professor Jaideep Srivastava and all DMR lab members especially Zoheb Borbora, Amogh Mahapatra, Young Ae Kim, Nishith Pathak, Kyong Jin Shim and Nisheeth Srivastava  Virtual Worlds Observatory  Professor Noshir Contractor (Northwestern), Professor Scott Poole (UIUC), Professor Dmitri Williams (USC)  External Collaborators at Northwestern, UIUC, USC, U Toronto especially Brian Keegan  Funding Agencies:  NSF, AFRL, NSCTA, IARPA 128
  • 129.
    Publication Summary Publication Type Total Thesis Related Comment Book 1 1 Book on Clandestine Behaviors and Networks (Springer 2012) Conference 14 5 Papers Book Chapters 1 - Journal Papers 2 - Workshop Papers 10 3 2 Best Paper Awards Short/Poster 8 1 Papers Technical Reports 4 1 Tutorials 4 1 27 Co-authors Patents 2 1 129
  • 130.
    A Computational Model for Social Influence
  • 131.
    Objective: Model social influencein a dynamic network setting as a spread of cascades to identify key influential nodes 131
  • 132.
    Outline  Team Overview  Optimal Target Selection  Current approaches  Proposed Approach  Results  Update summary  Q&A 132
  • 133.
    Motivation Original Retweet Global retweets of Tweets coming from Japan for one hour after the earthquake 133 Courtesy:http://blog.twitter.com/2011/06/global-pulse.html
  • 134.
    Optimal Target PeopleSelection  Goal: Find the optimal targets to maximize influence spread in network  Why is it important?  Public Polls: Sentiment spread  Sales and Marketing: Word-of-mouth spread  Public Health: Disease spread  Influence: A force that attempts to change the opinion or behavior of the individual  Influence is causal  My friend buys a product -> I buy a product 134
  • 135.
    Current Approaches (1/2)  Assume a certain model for propagation of influence [1]  Independent Cascade Model  Each node is independently influences the neighbor using a biased coin toss  Linear Threshold Model  Each node picks a random threshold  Each node infects the neighbor by a specified amount All of them need to know the propagation probability 135
  • 136.
    Current Approaches (1/2)  Use a two step approach  Step 1: Choose influence Model  Step 2: Find the subset of top-k nodes that maximizes the influence  Bad News: Step 2 is NP-Hard [2]  Popular method for step 2: Greedy Heuristics  Greedy heuristics gives (1-1/e) approximation  Address scalability issues in the second step [3][4][5][6]  CELF [3]: Influence maximization function follows law of diminishing returns (sub-modular) 136
  • 137.
    Limitations of currentapproaches  Estimation/Learning of propagation probabilities  Data-driven and model free approach  Propagation models are not content unaware  Content-aware influencer mining  No causality effect in influence model  Add causal effect in influence propagation 137
  • 138.
    Proposed Approach Communication Logs Frequent Network Sequence Discovery of Mining Influencers Ordered list of influencers Network Structure 138
  • 139.
    Sequence Cascade Mining Example Temporal order Append neighbors from network i=2 Check downward closure and support count Peter, Charles, Kim, Vicky, Nancy Network structure Let Support = 2 139
  • 140.
    Sequence Cascade Mining Example Append neighbors from network i=3 Check downward closure and support count Peter, Charles, Kim, Vicky, Nancy Network structure Let Support = 2 140
  • 141.
    Sequence Cascade Mining Example Append neighbors from network i=4 Break loop; No more work to do… Peter, Charles, Kim, Vicky, Nancy Network structure Return Frequent Sequences Let Support = 2 141
  • 142.
    Sequence Cascade Mining L1 Frequent Sequence Mining Candidate Generation for k+1 By appending neighbors Downward closure pruning Support Count and reorganize 142
  • 143.
    Influence Function  Node pattern Influence Set (Q(i,p)) =  Influence Set of a Node  Influence Set of a Node Set It can be shown that influence function I(V,s) is monotone and sub-modular 143
  • 144.
    Network Discovery ofInfluencers Greedy heuristic with (1-1/e) approximation guarantee 144
  • 145.
    Network Discovery ofInfluencers Frequent Sequences Example i=1 Choose the node with max out-degree Find top-3 (Randomly break tie between P influencers and C) P C chosen Current Set of Influenced People C N C K P K V 145
  • 146.
    Network Discovery ofInfluencers Frequent Sequences Example i=2 Choose the node with max out-degree P P chosen Current Set of Influenced People C N C K P V N K V 146
  • 147.
    Network Discovery ofInfluencers Frequent Sequences Example i=3 Choose the node with max out-degree (Randomly break tie between K, V, N) P K chosen Current Set of Influenced People C N C K P V N K V 147
  • 148.
    Context-Aware Influencers Context Specific influencers Bag words Frequent Network representing Sequence Discovery of the context Mining Influencers 148
  • 149.
    Initial Results (1/2) DBLP Dataset USPTO Dataset Significant performance gain over established baseline PrefixSpan 149
  • 150.
  • 151.
    Top Influencers DBLP USPTO * Dataset Dataset Thomas Huang Dieter Freitag Professor, UIUC CTO, FRX Polymers American Plastics Hall of Fame Philip Yu Akira Suzuki Professor, UIC 2010 Nobel Prize winner In Chemistry * Elisa Bertino Professor, Purdue Wilhelm Brandes Scientist, Bayer Crop Science 151 * Prolific inventors 1988-1997 as per USPTO announcement [7]
  • 152.
    What‟s next?  Short Term o Implement NDISC algorithm o Verify the influence spread of NDISC compared to popular baselines (PMIA, DegreeDiscountIC, SPM, and SP1M) o Qualitatively analyze the influencer results compared to baseline algorithms  Long Term o Need to extend the approach for dynamic time evolving content and network structures o Need implement influence analysis for streaming algorithms o Modeling of influence propagation in collaborative networks using Hypergraphs o Need to do this for multi-relational graphs 152
  • 153.
    Overall Summary Goals Optimal Target People Selection  Optimal target people selection using Task 3.3 model-free influence models  Develop a content-centric approach for sequential cascade UMN Team: mining  Develop greedy heuristics for Karthik Subbian (Reseacher) influencer mining from cascades Jaideep Srivastava(Faculty)  Tracking the influence of target nodes using online algorithms Novel Ideas Future Plans  Optimal Target Selection  Short Term  Implement NDISC and compare with  Content-centric baseline algorithms  Mining Sequential cascades in  Long Term network  Finding influencer in dynamic time  Finding influencers from evolving network structures cascades  Modeling of influence propagation in hyper-graphs and multi-relational  Discover context specific graphs influencers
  • 154.
    Course Outline • Module1 • Introduction to Social Analytics – applying data mining to social computing systems; examples of a number of social computing systems, e.g. FaceBook, MMO games, etc. • Module 2 • Computational online trust • Identifying key influencers • Information flow in networks • Module 3 • Analysis of clandestine networks • Katana - game analytics engine 4/9/2013 University of Minnesota 154
  • 155.
    Untangling Dark Webs Theories,Methods, and Models for a Computational Social Science of Clandestine Networks 155
  • 156.
    Defining Clandestine Behavior  Clandestine behavior is socially and culturally constructed. There is no computational definition of what constitutes clandestine behavior  “Kept secret or done secretively, esp. because illicit” (Webster‟s Definition)  Clandestine network thus involve actors and/or activities which are illicit in nature
  • 157.
    Kevin Bacon Linkedto Al-Qaeda ! Just because two things are related does not mean that they have a concrete relationship. Unless …..
  • 158.
    Clandestine organizations as networks  Networks are more flexible organizational forms than markets or hierarchies [Powell 1990; Podolny 1998; Brass, Galaskiewicz, et al. 2004; Robins 2009]  Criminals are embedded within organizations supporting division of labor and specialization [Cressy 1972; Canter & Alison 2000; Waring 2002]  Trust relations mediate functional relations like grouping, exchange, & communication [McIntosh 1974; von Lampe 2004]  Balancing security vs. efficiency; time-to-task; resilience vs. flexibility [Milward & Raab 2003, 2006; Morselli, Giguere, & Petit 2007]
  • 159.
    Contrasting network assumptions  Stohl & Stohl (2005) Current assumptions Network theory assumptions 1 Networks are information systems Networks are multi-functional communication systems 2 Networks have uniplex ahistoric relations Networks have multiplex and dynamic relations 3 Networks are hierarchically organized, C2 Networks are temporary, dynamic, structures emergent, adaptive, flexible 4 Boundary specification is a political tool Boundary specification is an analytic tool 5 Networks are globalized and homophilous Networks are local, glocal, global, and heterogeneous
  • 160.
    Information systems?  Network ties aren‟t just for sharing information and resources, but also represent latent relationships  Networks are not a machine to be broken, but a social organism that can adapt and reproduce  What are underlying processes that govern how networks evolve and maintain their structure?  Networks are structures for sensemaking & socialization  Bomb-making websites easy to disrupt but message boards which foster communication and solidarity among people with similar sympathies and values much less so
  • 161.
    Uniplex, ahistoric relationships?  Essential covert relationships like trust & knowledge exchanged based on multiplex ties and attributes  Shared background, common identity, family ties, reciprocity  Jihadi cells prevalent in Montreal, London, Madrid, Hamburg because of poor social integration & high disaffection  Longitudinal data necessary to measure changes over time  Repeated measurements of the same network actors, ties, attributes  Collection problems:  Left censoring: important changes happened before data collection  Right censoring: data collection does not last long enough to capture important changes  Boundary specification: Omitting important actors, ties, attributes  Cognitive biases: Over or under-recall  Instrumentation: measurement inconsistencies, panel conditioning & attrition, missing data
  • 162.
    Hierarchies?  Licit and illicit organizations are no longer hierarchical pyramids with clear chains of command  Network identities and roles are not fixed and constant  Networks do not operate according to formal & unambiguous rules  Cells: Operations can be performed spontaneously & independently  Individuals may identify with an organization, but are not part of it  Networks operate at multiple levels  “Organizations created out of complex webs of exchange, dependency, reciprocity among multiple organizations” (Monge &Contractor 2003)  Political parties, bureaucrats, businesses, charitable organizations, clinics, schools, & houses of worship are legitimate peripheral actors which enable covert organizations
  • 163.
    Easily specified?  Specifying rules for including/excluding actors in network membership is essential part of research design  Family members, politicians, businesses, charitable organizations, …  Unit of analysis: individuals, teams, organizations?  Example: Snowball sampling  Identify initial set then their links to second degree, third degree, etc.  Missing crucial seed actors, time intensive, super-abundance of unreliable data, “everyone” connected by 6 degrees of separation  Labels are fuzzy & prone to political framing  “Terrorist vs. freedom-fighter”: UK vs. IRA? China vs. ETIM? Russia vs. Chechnya? Afghanistan/Pakistan vs. Taliban? Iraq/Turkey vs. Kurds?  Distinguish between being in contact vs. being operational  Ability to mobilize, control, coordinate members  Look for how organizations cleave when they are forming or being changed
  • 164.
    Global & homophilous?  Terrorist networks encompass very different ideologies and goals  Religious (salafism), ethnic (Kurds), and nationalist (ETA) movements  Coercive bargaining (IRA, PLO, ETA) vs. war-inducing (AQ)  Different motivations leads to different formation processes & structures  Hamas, Hezbollah, ETA emphasize shared background/ethnicity vs. “movements” like al Qaeda or militias committed to shared ideology  Relational homophily drives creation of self-similar strong ties which remain dormant, difficult to identify & enter, easy to disassemble  Ideological homophily drives creation of single-issue ties which remain salient, more permeable boundaries means easier to enter but harder to prevent re-formation
  • 165.
    Criminal networks literature  Balancing security vs. efficiency; time-to-task; resilience vs. flexibility  Milward & Raab 2003; Morselli, Giguere, Petit 2007  Secrecy-oriented networks emphasize sparse, decentralized networks to avoid detection  Erickson 1981  Decentralization a common response to authorities‟ targeting and seizure  Morselli & Petit 2007  Leaders on periphery (low closeness) to avoid detection  Baker & Faulkner 1993  Drug traffickers exhibit higher centralization in core of participants with stable roles, but insulate by adding participants to extend periphery of core  Morselli, Giguere, Petit 2007; Dorn, Oette, White 1998 165
  • 166.
    Covert networks aredynamic networks  Multiple types of relationships  Familial ties, communication, exchange, authority, & other latent relationships  Actor-actor, actor-attribute, actor-event networks  Individuals central in one network are peripheral in other networks  Positions and relationships are not stable  Removing links & nodes can alter pattern but does not address underlying processes that govern how networks evolve, maintain, dissolve  Outwardly different networks share common structural properties  Faust & Skvoretz (2002): Senate co-sponsorship most similar to cow-licking  Outwardly similar networks generated by different processes
  • 167.
    Cutpoints & bridges  Brokers are excellent targets to disassemble networks  Cutpoint  Removing a node creates a new component  Bridges  Removing a link creates a new component
  • 169.
    Clandestine networks are complex  Multiple types of relationships [Stohl & Stohl 2007]  Networks have multiplex and dynamic relations – trust, exchange, communication, authority, and other relationships  Networks are temporary, dynamic, emergent, adaptive, flexible  Networks are local, glocal, global, and heterogeneous – different ideologies, motivations, goals lead to different structures & processes  Descriptive network analysis does not address underlying processes of how networks emerge, stabilize, dissolve  Goal: To disrupt network, understand and attack the processes which create and stabilize
  • 170.
    Modeling Clandestine Organizations and Behaviors as Networks
  • 171.
    Introduction to networkanalysis  Networks are sets of nodes connected by links  Nodes can be people, groups, webpages, etc.  Links can be friendships, exchanges, affiliations, etc. A B
  • 172.
    Networks - Directed& undirected  Communication vs. friendship networks A B C A B C D E D E F G F G
  • 173.
    Degree distributions  What‟s the probability P(k) of randomly selecting a node with degree k in this network? 24/31 P(k) 6/31 1/31 1 4 12 k
  • 174.
    Power laws Internet routers Movie actors • Large networks can have degree distributions that span • several orders of magnitude Many real world networks follow ᵞ ᵞ a power law degree distribution • Scale free networks, 80/20 rule, Pareto principle, Zipf‟s Law, long tail, etc. ᵞ ᵞ  P(k ) ~ k Physicists Neuroscientists
  • 175.
  • 176.
    Path length  Path length: number of links A B between two nodes (degrees of separation) C D  BACDE = 4  Geodesic: Shortest path E length between two nodes  BAE = 2 F  Diameter: Network‟s largest geodesiceccenctricity  BAEFGHBACJIH G H I J K
  • 177.
    Density, clustering, centralization  Density A  Observed edges in network / maximum possible edges B C  Clustering  Count ties among alters, removing ego and ties to D ego  Observed ties in actor‟s ego network / maximum possible ties in ego network E  Network centralization  Variation in individual actors‟ centralities F  High centralization when few actors possess higher centrality than average G H  Low centralization when actors all have similar centralities I
  • 178.
    Paths & clusteringacross networks
  • 179.
    Small worlds  Paradox: Individuals within the network are highly clustered but also have small average geodesics to other members  Randomly rewiring a fraction of links on a regularly-clustered network drastically shortens average eccentricity  Random rewiring, however still maintains high clustering over several orders of magnitude
  • 180.
    Degree centrality  Degree: total number of links with other actors  In-degree: Directional links to actor from other actors  Out-degree: Directional links from actor to other actors  “Popularity” C A F D H I J B G E
  • 181.
    Closeness centrality  How easily one actor can reach rest of network  Actor with shortest average path length  “Pulse-taker” C A F D H I J B G E
  • 182.
    Betweenness centrality  How much an actor lies between distinct groups  Number of geodesics passing through actor  “Broker” C A F D H I J B G E
  • 183.
    Multi-Dimensional Networks - Attributes  Selection: Immutable characteristics  Race, ethnicity, gender, etc.  Simple process: Attributes remain fixed and influence how connections are formed  Homophily  Influence: alterable characteristics  Interests, activities, infectiousness, etc.  Complex process: Feedback and interactions between ego‟s attributes, network structure, alters‟ attributes  Diffusion and coevolution
  • 184.
    Selection and homophily  Existing attributes drive creation & destruction of connections  “Birds of a feather flock together” A B C A B C D E Time D E F G F G
  • 185.
    Influence and diffusion  Existing connections drive creation & destruction of attributes A B C A B C D E Time D E F G F G
  • 186.
    Multi-relational  Organizations: authority, trust, & friendship A B C D E F G
  • 187.
    Triad census &network motifs  16 possible triads typesmotifsisomorphisms in a directed network  (Reciprocated ties, unreciprocated ties, null ties)  Triad census: frequency of each structure in a network  Compare frequencies in observed network, measuring deviations against frequencies in random networks  Simmelian ties: Transitive-reciprocated triads (3-0-0) occur appear more frequently than any other motif in social networks 003 012 102 021D 021U 021C 111D 111U 030T 030C 201 120D 120U 120C 210 300
  • 188.
  • 189.
    Growth & preferential attachment  How do you generate scale-free networks?  Rich get richer (Yule 1925, Simon 1955, de Solla Price 1976, Albert & Barabasi 1998) ?
  • 190.
    Data collection issues  Data on clandestine organizations are doubly hard to obtain  Clandestine networks by definition seek to avoid detection or identification  Law enforcement prohibited from collecting some types of data, reluctant to disclose extant data to prevent adaptation  Criminal network studies to date generally rely on:  Evidence entered into court proceedings [Sparrow 1991; Baker & Faulkner 1993]  Imputation from secondary or tertiary sources [Krebs 2002; Sageman 2004]
  • 191.
    Problems with approaches  Collection of incomplete network data seriously compromises reliability of findings [Wasserman & Faust 1994]  Boundary specification of nodes – peripheral & legitimate actors often omitted despite playing crucial roles  Censored data – only one type of interaction recorded; earlier & later ties may be impossible to capture  Lack of attributes – occupation, gender, affiliation, personality, psychological states strongly influence tie-formation behavior over time [Robins 2009] Krebs 2002
  • 192.
    Introduction to MMOGs, Gold Farming, and Everquest II
  • 193.
    MMOGs  Massively Multiplayer Online Games (MMOGs)  Shared online persistent virtual environments where millions of people can interact with one another  Players can complete quests, interact with the game environment, interact with other players  Many of the behaviors which are observed in the real world are also observed in MMOGs e.g., friendship, economic behavior, backstabbing, illicit behaviors etc
  • 194.
    Gold farming • Goldfarming and real money trade involve the exchange of virtual in-game resources for “real world” money • Laborers in China and S.E. Asia paid to perform repetitive in-game practices (“farming”) to accumulate virtual wealth (“gold”) • Western players purchase farmed gold to obtain more powerful items/abilities and open new areas within the game • Market for real money trade exceeds $3 billion annually [Lehdonvirta & Ernkvist 2011] 194
  • 195.
    Deviance  Game companies actively ban accounts involved in gold farming operations  Why?  Upsets game economy equilibrium by inflating prices  Excludes other players from shared game environments  Automated bots/scripts ruin social interactions  Pay-to-play upsets meritocratic expectations  Theft of billing or account information  Legal ramifications of virtual items as “property” 195
  • 196.
    Identification  Most gold farmers are caught by:  Other players reporting behavior  Farmers’ solicitations and spamming  Organized “sting operations”  Administrator heuristics  Farming operations employ highly-specialized operations that have to balance practices to efficiently accumulate gold with practices to avoid detection
  • 197.
    Changes in banreasons, all 197
  • 198.
    Dynamics, bursts, lifecycle Ninja Metrics confidential information. Copyright 2012
  • 199.
    Mapping  Gold farmers potentially operate under similar motivations and constraints as other clandestine or criminal organizations  Profit motive – illicit goods with minimal input costs provide arbitrage opportunities  Distribution challenges – suboptimal processes and structures for generating and distributing goods so to avoid risk of detection  Selection pressures – authorities confiscate goods and detain participants when detected
  • 200.
    EverQuest II  Massively Multiplayer Online Role Playing Game (MMORPG, MMO)  Data spanning from January 2006 to Mid-September 2006  2.1 million players across multiple servers  Our analysis focuses on a subset of the server  In-game action data available as well as social interaction data (trust, mentoring, questing, grouping, trade) etc
  • 201.
    EverQuest II –Gold Farmers  Gold Farmers are explicitly labeled in the dataset  Gold Farmers constitute only a small percentage of all of the players (1-3%)  Gold Farmers bots are not as common in EverQuest II as it is generally assumed to be the case in MMOs
  • 202.
    Types of GoldFarmers  There are multiple types of Gold Farmers • Gatherers: Accumulate gold or other resources • Bankers: Low-activity reserve accounts • Mules and dealers: Single user accounts to transmit money and interact with the customer • Barkers: Spammers marking services in the game  Issues: The dataset however does not have labels for the gold farmer sub-types  Open Question: Are there other types of gold farmers?
  • 203.
    Descriptive properties and Statistical network models
  • 204.
    Properties of Clandestine Networks  Clandestine Networks are embedded in larger networks and have been studied in isolation as well as being embedded in larger networks  Do clandestine networks exhibit properties that distinguish them from normal networks?  Behavioral Signatures  Structural Signatures  Spatio-Temporal Signatures
  • 205.
    Centrality comparisons  Compared to the population-at-large, do farmers or affiliates have higher or lower…  Incoming trade relationships? (In-degree)  Outgoing trade relationships? (Out-degree)  Incoming transactions? (In-weight)  Outgoing transactions? (Out-weight)  Proximity to the rest of the network? (Closeness)  Level of brokerage? (Betweenness)  Higher levels of “prestige”? (Eigenvector)  Tendency for counterparties to also trade? (Clustering)
  • 206.
    Multinomial logit  Compared to non-farmers, do farmers & affiliates have significantly different structures? Farmers (z-statistic) Affiliates (z-statistic) In-degree -0.0473 (-9.24)*** 0.0626 (47.21)*** Out-degree -0.189 (-5.11)*** 0.0529 (44.25)*** In-weight 0.00691 (23.04)*** 0.00851 (32.64)*** Out-weight 0.00796 (24.32)*** 0.009556 (32.97)*** Closeness -0.679 (-6.38)*** -2.438 (-13.24)*** Betweenness 1.56x10-7 (1.94) 1.36x10-7 (35.81)*** Eigenvector -10300 (-8.60)*** 5377 (35.73)*** Clustering coefficient 0.905 (8.64)*** -0.0926 (-0.81)
  • 207.
    Network characterizations  Degree distribution  What fraction, P(k), of nodes in the network have k connections?  Weight distribution  What fraction, P(s), of links in the network have had s transactions?  Power law/scale free distributions  80/20 rule: minority of nodes have majority of links  Generated by growth and preferential attachment  Linear on a log-log plot… with some interesting exceptions
  • 208.
    Degree distribution &attenuation k 1.E+00 1 10 100 1.E-01 1.E-02 P(k) 1.E-03 1.E-04 1.E-05 Farmer-In Farmer-Out Affiliates-In Affiliates-Out Non-Affiliates-In Non-Affiliates-Out
  • 209.
    Growth constraints • Power law scaling followed by k 1.E+00 exponential cut-off 1 10 100 • Aging: old nodes stop accepting 1.E-01 new links • Cost: becomes more expensive 1.E-02 to accept new links P(k) • Capacity: nodes stop accepting 1.E-03 links above threshold • Copying: new nodes imitate 1.E-04 connections of existing nodes 1.E-05 Farmer-In Farmer-Out Affiliates-In Affiliates-Out Non-Affiliates-In Non-Affiliates-Out
  • 210.
    Assortative v. randommixing  Is the degree of a node correlated with its neighbors’ degrees?  No: random mixing  Yes, positive: assortative mixing  Yes, negative: dissortative mixing  Assortative mixing  Well-connected nodes are connected to other well-connected nodes  Poorly connected nodes connected to other poorly connected nodes  Dissortative mixing  Well-connected nodes are connected to poorly-connected nodes
  • 211.
    Assortativity in context  Dissortativity found in biological, ecological, & technological networks that require failure tolerance  Assortativity found in social and collaboration networks
  • 212.
    Comparative network analysis  How does the gold farming network compare against a “real world” criminal network?  Criminal network data hard to get a hold of in first place  Problems defining boundaries, multiplex relations, etc.  Carlo Morelli’s CAVIAR drug trafficking network (N=110, E=295)
  • 213.
    Assortativity Normal players and farmer affiliates both adopt collaborative interaction structures 10 Knn(k) 1 1 10 100 Gold farmers and drug traffickers both adopt 0.1 avoidance interaction structures k Affiliates-InIn Affiliates-OutOut Non-Affiliates-InIn Non-Affiliates-OutOut Farmer-InIn Farmer-OutOut Caviar-InIn Caviar-OutOut
  • 214.
    Conclusions – Assortativity  Non-affiliated players exhibit assortativity  Collaboration > Resilience  Affiliate network generally assortative  Collaboration > Resilience  High degree outliers likely unidentified farmers  Farmers’ network is dissortative  Collaboration < Resilience  Drug trafficking network similarly dissortative  Collaboration < Resilience
  • 215.
    Attack & failuretolerance  Failure: random removal of nodes  Attack strategies  Degree attack: removal of best-connected nodes  Edge attack: removal of high-transaction dyads  Outcomes  Fraction of remaining nodes in largest connected component  Fraction of remaining nodes as isolates
  • 216.
    Study 2 -Attack & failure analysis 1 0.9 0.8 0.7 0.6 Fraction 0.5 Degree attack fractures farming network 0.4 faster than random failure or edge attack 0.3 0.2 0.1 0 0.01% 0.10% 1.00% 10.00% 100.00% Node fraction removed Degree attack - LCC Degree attack - Isolate fraction Random failure - LCC Random failure - Isolate fraction Edge attack - LCC Edge attack - Isolate fraction
  • 217.
    Comparative attack &failure analysis – LCC fraction 1 0.9 0.8 0.7 0.6 Fraction 0.5 0.4 0.3 0.2 0.1 0 0.01% 0.10% 1.00% 10.00% 100.00% Node fraction removed Farmer attack - LCC fraction Farmer failure - LCC fraction Caviar attack - LCC fraction Cavaiar failure - LCC fraction
  • 218.
    Comparative attack &failure analysis – Isolate fraction 1 0.9 0.8 0.7 0.6 Fraction 0.5 0.4 0.3 0.2 0.1 0 0.01% 0.10% 1.00% 10.00% 100.00% Node fraction removed Farmer attack - Isolate Farmer failure - Isolate Caviar attack - Isolate Caviar failure - Isolate
  • 219.
    Conclusions – Attacktolerance  Edge attack strategy has poorer performance than even random failure  Gold farmers & drug traffickers respond similarly to degree attack
  • 220.
  • 221.
  • 222.
  • 223.
  • 224.
  • 225.
  • 226.
  • 227.
  • 228.
    Month 1 inMonth 5 228
  • 229.
    Month 4 inMonth 5 229
  • 230.
    Month 5 inMonth 5 230
  • 231.
    Month 6 inMonth 5 231
  • 232.
    Month 8 inMonth 5 232
  • 233.
    Predictive Modeling and Machine Learning approaches
  • 234.
    Gold Farmer Detection  How does one catch gold farmers?  Are there characteristics which can be used to distinguish gold farmers?  What attributes can be used to detect GFs?  Player Character Attributes: Gender, Class etc of the player character  Player Activity Data: Player activities over the course of time e.g., number of quests, type of quests, NPCs killed?  Player Socialization Data: Grouping others, trust, mentoring  Player Demographics: Real world age, gender, location
  • 235.
    Machine Learning Approaches  Multiple ways to catch gold farmers:  Binary Classification Problem  Multi-class Classification  One class Classification problem  Cascading Classifiers Problem  Outlier Detection  Label Propagation Problem  Combination of these  Class Labeling Issues: Not all players who are labeled as normal players are normal players
  • 236.
    Machine Learning BinaryClassification Approach  Two main classes: GFs and non-GFs  Highly Skewed Distribution  9,178 Gold Farmer Characters out of a total of 2.1 million characters  Standard ser of combinations of classifiers and features e.g., Naive Bayes, Bayes Net, Logistic Regression, KNN, J48, JRip, AdaBoost and SMO etc 236
  • 237.
    Feature Set • Demographic Features* • Performance Features • Task distributions (set of tasks performed) • Sequence of activities performed by gold farmers • Examples: KKKdDKdEESSKD, SSSEKdKdDD – K= Killed Monster, d = damage points, D = Character Death, S = Completed a recipe e.g., spell * All information is annonymized. 237
  • 238.
  • 239.
  • 240.
    Initial Classification Results Frequent Pattern Mining Association
  • 241.
    Label Propagation  Research Problem: Not all people who are labeled as normal players are such. Some of them are gold farmers but have not been identified as such  Analogue: Not all people who are free i.e., not in jails are innocent  Solution: Use label propagation to label people who may be gold farmers  Analogue: If someone is socializing almost exclusively with criminals then he may be a criminal
  • 242.
    Label Propagation (a)Classification Results (b) Social Networks within class boundaries
  • 243.
    Label Propagation • Guilt by association • Propagate labels based on the social neighborhood and socialization patterns of players • If player A spends > 80% of time socializing (grouping, trusting, mentoring other gold farmers then he is likely a gold farmer • Alternative methods: Propagation based on similarities Propagate Labels based on the Social Networks of the Gold Farmers
  • 244.
    Results Classifier Metric Initial Dataset Label Prop Change In Performance Lift Bayes Net Precision 0.17 0.189 0.019 1.11 Recall 0.834 0.819 -0.015 0.98 F-Score 0.282 0.307 0.025 1.09 J48 Precision 0.494 0.62 0.126 1.26 Recall 0.189 0.337 0.148 1.78 F-Score 0.273 0.437 0.164 1.60 J Rip Precision 0.495 0.537 0.042 1.08 Recall 0.462 0.462 0 1.00 F-Score 0.478 0.497 0.019 1.04 KNN Precision 0.436 0.46 0.024 1.06 Recall 0.396 0.428 0.032 1.08 F-Score 0.415 0.443 0.028 1.068 Logistic Regression Precision 0.455 0.534 0.079 1.17 Recall 0.189 0.271 0.082 1.43 F-Score 0.267 0.36 0.093 1.35 Naïve Bayes Precision 0.146 0.142 -0.004 0.97 Recall 0.538 0.502 -0.036 0.93 F-Score 0.23 0.221 -0.009 0.96 Adaboost w/ DT Precision 0.405 0.471 0.066 1.16 Recall 0.105 0.08 -0.025 0.76 F-Score 0.167 0.137 -0.03 0.82
  • 245.
    Results Comparison Lift (Social Comp vs SocialComp’09 Label New Features + Label Metric (F0) Propagation Propagation) Precision 0.493 0.537 1.089 Recall 0.304 0.462 1.520 F-Score 0.376 0.497 1.322 • The recall improves significantly from the previous results which implies that the accounts that we are catching more gold farmers • In case of Label propagation the precision increases from 0.49 to 0.54 which implies that more of the users being identified as gold farmers by us are indeed gold farmers
  • 246.
    Gold Farmer Detectionas a One class classification Problem • The labels of one class are known for certain • The labels for the rest of the records are not known with certainty • Use the known class for training and classify the rest of the records for the known class • Issues: The known class consists of many subclasses which different feature sets Ninja Metrics confidential information. Copyright 2012
  • 247.
    Disambiguating Gold Farming sub-classes  Heuristics can be used to disambiguate the different sub- classes  Gatherers have high in-game intense activity associated with them  Mules have high trade volume but low in-game activity  Bankers have low level trade volume and low in-game activity  Spammers have little of no trade activity (in general) but denser chat networks
  • 248.
    Disambiguating Gold Farmingsub- classes High Intensity Gatherer 0000 hour 1200 hours High Intensity Normal Player 0000 hour 1200 hours Banker 0000 hour 1200 hours Player with Periodic Behavior 0000 hour 1200 hours Low intensity Player 0000 hour 1200 hours Ninja Metrics confidential information. Copyright 2012 248
  • 249.
    Research Question “Howdo Gold Farmers change their behaviors as a consequence of game admin's behaviors?” Can we anticipate GF change in behaviors in advance?
  • 250.
    Change Detection inGold Farmer Behavior • Research Question: How do gold famers respond to global enforcement of policies by the game admins • The in-game activities of gold farmers can be represented as a time series • Applied clustering to these series and 3 clusters made the most sense (most clear separation) • Each cluster can be mapped to a gold farmer subtype • However 20-30% of all players in each cluster are non-GFs • Change detection to determine when the time series changed 250
  • 251.
    Interpretation: Global Changesin GF Behaviors - Adaptation • Gold farmers and game admins change their activities based on how the other acts in the game • Previously there was anecdotal evidence for this change, we have established that this happens at not just the activity level but at the structural pattern level • Can we predict how the gold farmers will act if game admins adopt a certain banning policy? 251
  • 252.
    Research Question “A plagueupon‟t when thieves cannot be true one to another!” – Sir Falstaff, Henry IV, Part 1, II.ii Do gold farmers trust each other?
  • 253.
    Housing-Trust in EQ2 • Access permissions to in-game house as trust relationships • None: Cannot enter house. • Visitor: Can enter the house and can interact with objects in the house. • Friend: Visitor + move items • Trustee: Friend + remove items • Houses can contain also items which allow sales to other characters without exchanging on the market 253
  • 254.
    Hypergraphs to RepresentTripartite Graphs • Accounts can have several characters • Houses can be accessed by several characters • Projecting to one- or two-model data obscures crucial information about embededdness and paths • Figure 2a: Can ca31 access the same house as ca11? • Figure 2b: Are characters all owned by same account? 254
  • 255.
    Hypergraphs: Key Concepts • Hyperedge: An edge between three or more nodes in a graph. We use three types of nodes: Character, account and house • Node Degree: The number of hyperedges which are connected to a node • NDh1 = 3 • Edge Degree: The number of hyperedges that an edge participates in • EDa1-h1 = 2 255
  • 256.
    Approach • Game administrators miss gold farmers and deviance is not a simple binary classification task • Guilt by association: Identify “affiliates” who have ever interacted with identified gold farmers, but have not been identified as gold farmers themselves A B C Farmer Affiliate Non-affiliate
  • 257.
    Network Characteristics • Longtail distributions are observed for the various degree distributions • The mapping from character-house to an account is always unique 257
  • 258.
    Characteristics of HypergraphProjection Networks • Account Projection: Majority of the gold farmer nodes are isolates (79%). Affiliates well-connected (8.89) vs non-affiliates (3.47) • Character Projection: Majority of the gold farmer nodes are isolates (84%). Affiliates well-connected (10.42) vs non-affiliates (3.23) • House Projection: 521 gold farmer houses. Most are isolates (not shown) but others are part of complex structures. Densely connected network with gold farmers (7.56) and affiliates (84.02) 258
  • 259.
    Key Observations • Picky picky: Gold farmers grant trust ties less frequently than either affiliates or general players • Gold farmers grant and receive fewer housing permissions (1.82) than their affiliates (4.03) or general player population (2.73) Total degree In degree Out degree <n> < nGF > < nAff > <n> < nGF > < nAff > <n> < nGF > < nAff > Farmers 1.82 0.29 1.82 0.89 0.29 0.89 1.07 0.29 1.07 Affiliates 4.03 1.28 0.70 1.55 0.75 0.70 2.88 0.63 0.70 Non-Affiliates 2.73 - 7.77 1.57 - 5.98 1.56 - 2.34 259
  • 260.
    Key Observations • No honor among thieves • Gold farmers also have very low tendency to grant other gold farmers permission (0.29) • Affiliates also unlikely to trust other affiliates (0.70) Total degree In degree Out degree <n> < nGF > < nAff > <n> < nGF > < nAff > <n> < nGF > < nAff > Farmers 1.82 0.29 1.82 0.89 0.29 0.89 1.07 0.29 1.07 Affiliates 4.03 1.28 0.70 1.55 0.75 0.70 2.88 0.63 0.70 Non-Affiliates 2.73 - 7.77 1.57 - 5.98 1.56 - 2.34 260
  • 261.
    Key Observations • Affiliates are brokers: • Farmers trust affiliates more (1.82) than other farmers (0.29) • Affiliates trust farmers more (1.28) than other affiliates (0.70) • Non-affiliates have a greater tendency to grant permissions to non-affiliates (7.77) than in general (2.73) Total degree In degree Out degree <n> < nGF > < nAff > <n> < nGF > < nAff > <n> < nGF > < nAff > Farmers 1.82 0.29 1.82 0.89 0.29 0.89 1.07 0.29 1.07 Affiliates 4.03 1.28 0.70 1.55 0.75 0.70 2.88 0.63 0.70 Non-Affiliates 2.73 - 7.77 1.57 - 5.98 1.56 - 2.34 261
  • 262.
    Frequent Pattern Mining:Key Terms Market Basket Transaction t1: Beer, Diaper, Milk t2: Beer, Cheese dataset example t3: Cheese, Boots t4: Beer, Diaper, Cheese t5: Beer, Diaper, Clothes, Cheese, Milk t6: Diaper, Clothes, Milk t7: Diaper, Milk, Clothes • Items: Cheese, Milk, Beer, Clothes, Diaper, Boots • Transactions: t1,t2, …, tn • Itemset: {Cheese, Milk, Butter} • Support of an itemset: Percentage of transactions which contain that itemset • Support( {Diaper, Clothes, Milk} ) = 3/7 262
  • 263.
    Frequent Itemset Miningfor Frequent Hyper-subgraphs o Support of a Hyper-subgraph: Given a sub-hypergraph of size k, subP is the pattern of interest containing the label P, shP is a pattern of the same size as subP and contains the label P, the support is defined as follows: Support of pattern also containing a gold farmer (red) = 5/8 263
  • 264.
    Frequent Itemset Miningfor Frequent Hyper-subgraphs o Confidence of a Hyper-Subgraph: Given a sub-hypergraph of size k, subP is the pattern of interest containing the label P, subG is a pattern which is structurally equivalent but which does not contain the label P, the confidence is defined as follows: Confidence of pattern and containing a gold farmer = 5/7 264
  • 265.
    Frequent Patterns ofGFs • Less than 0.1 support and confidence for almost all (except 8) frequent patterns with gold farmers • Remaining 8 patterns can be used for discrimination between gold farmers and non-gold farmers • Gold farmers & affiliates are more connected: A third of more complex patterns (k >= 10 nodes) are associated with affiliates (15/44) 265
  • 266.
    Conclusion and contributions Using hypergraphs to represent complex data structures and dependencies Application of frequent pattern mining to discover distinct trust patterns associated with gold farmers No honor between thieves: Gold farmers tend not to trust other gold farmers 266
  • 267.
    Implications Social organization and behavioral patterns of clandestine activity as co-evolutionary outcomes Using online behavioral patterns to inform and develop metrics/algorithms for detecting offline clandestine activity Clandestine networks as “dual use” technologies – ethical and legal implications of improving detection? [Keegan, Ahmad, et al. 2011]
  • 268.
    Limitations and futurework • Housing/trust ties mediated by other or multiplex relationships • Communication, grouping, mentoring, trading, etc. • Multiple types of deviance and deviants: Modeling role specialization & division of labor • Using frequent subgraphs patterns as discriminating features for ML models • Changes in frequent subgraphs over time 268
  • 269.
    Contraband; Online andOffline o Contraband are illegally obtained items constituting a parallel or shadow economy which evade regulation or taxation. o Governments try to interrupt such exchanges especially when they involve dangerous items like weapons and drugs o Extremely difficult to obtain data about contraband. Thus analysis is limited o Online analogues of such behaviors offer the possibility of analyzing such behaviors and closing this gap 269
  • 270.
    Research Questions oWhatdo the trade networks of gold farmers & normal players look like? oDo gold farmers exhibit distinctive behavioral patterns for buying and selling items? oWhat are the characteristics of contraband networks in MMOs? oCan we use contraband networks to catch gold farmers? 270
  • 271.
    Consignment Trade inEverQuest II o Gold farmer trading activity is a significant fraction of the trading activity and then it significantly decline o Possible Explanation: (i) Real decline in gold farming activity (ii) Gold farmers change their strategies to evade detection 271
  • 272.
    Consignment Trade inEverQuest II 272
  • 273.
    Gold Farmer TradingItems • Even though the trade volume of gold farmers decreases over time, the total number of unique items traded by them remains constant • This implies that there is a subset of items that gold farmers are interested in buying and selling • Specialized items have a low trade volume 273
  • 274.
    Frequent Pattern Miningfor Contraband and Trade Network oMain Idea: There certain behaviors and/or activities that are associated more with gold farmers as compared to other people oOnce identified these can be used as feature sets to build models to classify people as deviant vs. non-deviant oWe analyze the social networks as well as the item-usage networks of gold farmers
  • 275.
    Item-Projection Networks andFPM Framework oTwo mode network of players and items sold oProjection network of items: An edge is created between two items if they have been traded by the same person oWhat are the items which are traded by gold farmers as compared to others? Player X Item A Item A Item B Item B
  • 276.
    Frequent Items Associatedwith Gold Farmers
  • 277.
    Characteristics of Itemsassociated with Gold Farmers o With normal player a range of items types are associated with buying a selling activity o Surprisingly not only are certain items associated with gold farmer selling activity but also buying activity o The items that gold farmers buy are usually low end items (i) Used for crafting (ii) Cornering the market? o The items that are sold by gold farmers are usually high end items and in many cases almost exclusively sold by them
  • 278.
    Construction of Item-Networks o Apply standard frequent-pattern mining techniques (Apriori, FP-Tree) to determine frequently traded items o For all the items which occur frequently create an edge between them o For items which are sold in different transactions by the same people then also form an edge between them
  • 279.
  • 280.
    Frequent Subgraphs asfeatures oMain Idea: In addition to the features based on player characteristics use sub-graphs as features
  • 281.
    Prediction Models Model 1 (Player Attribute Based Features): These features are based on the attributes of the player‟s character in the game e.g., character race, character gender, distribution of gaming activities etc. These are the same features which were used by Ahmad et al [SocCom‟09]. Model 2 (Item Based Features): These are the features which are derived from items bought and sold from the consignment network. These features are based on the frequency of the frequent items sold or bought by gold farmers. Model 3 (Player Attribute & Item Based Features): All the attributes from the previous two models. Model 4 (Item Network Based Features): Features which are derived from the item network in a manner analogous to Model 2. Model 5 (Player Attribute & Item-Network Based Features): A combination of features from Model 1 and Model 4. Model 6 (Item Network & Item-Network Based Features): A combination of features from Model 2 and Model 4. Model 7 (Player Attribute, Item & Item-Network Based Features): Union of all the features described above.
  • 282.
    Results A set ofstandard machine learning models are used: Naive Bayes, Bayes Net, Logistic Regression, KNN, J48, JRip, AdaBoost and SMO
  • 283.
    Conclusion o Describeda phenomenon analogous to contraband in the offline world o Analysis of gold farmer item networks reveal that they exhibit characteristics which are different from that of normal players o Items that gold farmers buy are usually low end items and items that they sell are often high end items o One can use frequent items and networks of such items associated with gold farmers to build classifiers which can be used to catch gold farmers 283
  • 284.
    Future work  Association rule learning for temporal patterns  bursts of activity predict gold farming?  Statistical modeling of sudden changes and system responses in network over time  Expanding hypergraph approaches to representing complex relationships  Comparative analysis against EVE Online, other games with similar “gold farmers get banned” regimes
  • 285.
    Some ethical quandaries  Euphemisms abound: “removing” links and nodes  Should scholars be engaged in “destructive science”?  Clandestine network analysis as dual use technology  “Terrorist vs. freedom-fighter”  Used for good (MENA revolutions), evil (al-Qaeda), unclear (Wikileaks)  Legal dimensions of information theory and methods  Different assumptions in model & methods foreground different suspects  Minimize false positives or false negatives? Maximize true positives or true negatives?
  • 286.
    Legal questions  If moral, ethical, and legal boundaries that constrain behavior are indistinct or unenforceable in virtual world, who defines regulations to define and curb excesses?  Some normative rules hard-wired into code, others permitted by code [Lastowka & Hunter 2006; Grimmelmann 2006; Ondrejka 2006]  From false positives to due process  Responsibility to disclose proprietary methodological approaches?  Heightened or different burden of proof given superabundance of data?  Expectations of privacy?  Demonstrating intent?  Right to representation and due process?
  • 287.
  • 289.
    Analytics Architecture MMOG/VW MMOG Vendor Free Websites Game Knowledge Actionable Game Insights logs @ Data Piped from partner partner Analysis Engine HADOOP – Cloudera Release RDB - MSSQL Server 3rd Party Custom 289 Data/Axciom/TR W
  • 290.
    Analytics Pipeline Game Sony Data Domain CR3 Knowledge … Katana Analytics Engine (UI) • Java applet embedded in web browser Data Scheduled • Stand-alone deployable Java applet Modeling Pipeline Flow • Data cleaning Tera • Data transformation bytes • Normalization • Loading Analytics Engines Churn Gold Network Data Analysis Farming Value Data Warehouse Mart Standalone Java applications MS SQL Server 2008 Hadoop Cluster •Alienware workstations • Dell Precision 1500 (Workstation) • 4 Units of Dell PowerEdge R510 (rack server) • 2 TB hard drive, 8 GB memory, 2 Dual core CPUs • 1.5 TB hard drive, 4 GB memory, 2 Dual • Each with 14 TB hard drive, 12 GB memory, 2 core CPUs CPU x 8 parallelization = 16 CPUs 290
  • 296.
    Google: Aspiring toknow all aspects of your social life
  • 297.
    Social Networks SocialMultimedia Networks Text-based Conversation Networks
  • 298.
    Data Collected, Benefits,Impact  Data collected  Activity logs: Clicks, page visits, chats, friends, uploads, downloads, tags, comments, responses, joining and leaving of communities, people friended, people blocked, ads clicked on, ads ignored, frequency of logging in.  Network Logs: What are people‟s activities when I tag, comment, respond, stay idle, post, recommend and ignore?  Basically: „Too much data about you‟  Benefits  Advertising: Easiest way to spread an infection by tapping key nodes in Google's network  Sentiment Analysis: Detect how your product or service is doing  Etc., etc., etc., etc., basically too many to list  Impact  Tremendous  Also very scary!
  • 299.
    Facebook: The operating systemfor your social life, and more
  • 300.
    Facebook  > 950 million users for Facebook  >14% of humanity!!  >35% of all who have computers!!!  > 50% of Facebook users log on every day (http://www.facebook.com/press/info.php?statistics)  spending an average of 14 minutes per day (http://mashable.com/2010/02/16/facebook-nielsen-stats/)  Ultimate social data, no end to what can be done with it  No wonder FB is being pegged at $100+ billion IPO  They know everything  So even if Google doesn‟t scare you, Facebook should!
  • 301.
    Impact of newinstrumentation on science  1950s  Invention of the electron microscope fundamentally changed chemistry from „playing with colored liquids in a lab‟ to „truly understanding what‟s going on‟  1970s  Invention of gene sequencing fundamentally changed biology from a qualitative field to a quantitative field  1980s  Deployment of the Hubble (and other) Space telescopes has had fundamental impact on astronomy and astrophysics  2000s  Massive adoption is fundamentally changing social science research  Massively Multiplayer Online Games (MMOGs) and Virtual Worlds (VWs) are acting as „macroscopes of human behavior‟
  • 302.
    Some leading edgeUS research programs we are participating in  Biggest  US Army‟s Network Science Collaborative Technology Alliance  Led by BBN, with over 25 institutions and 150 researchers  Approximately $200 million over a 10 year period (2009 – 2019)  http://www.ns-cta.org/ns-cta-blog/  A number of DARPA programs  Social Media in Strategic Communication (SMISC)  Started in November 2011  http://www.darpa.mil/Our_Work/I2O/Programs/Social_Media_in_Strategic_Comm unication_(SMISC).aspx  Graph-Theoretic Research in Algorithms and PHenomenology of Social Networks (GRAPHS)  Started in March 2012  http://www.darpa.mil/Our_Work/DSO/Programs/Graph- theoretic_Research_in_Algorithms_and_the_Phenomenology_of_Social_Network s_%28GRAPHS%29.aspx
  • 303.
    Summary – TheBig Picture  Converging trends  Rapid increase in the usage of the Internet/Web   increased amount of interactions on line   huge amount of socialization on line  Increase in resolution and deployment of data collection „probes‟, e.g. GPS, cell phone/PDA, wireless enabled laptop, RFID tags, …   increased ability to monitor and record interactions at a really fine granularity  Dramatic increase in storage capacity and decrease in storage costs   feasible to store all the data collected  Fundamental advances in computational methods for data analytics  Becoming possible to really understand individual and group behavior at a fine granularity  Great opportunities for  Basic R&D  Applied R&D  Entrepreneurship  But, putting together the right team and partnerships is critical!
  • 304.
    and last, butcertainly not the least - thank you for your invitation