WHAT MAKES COMMUNITIESTICK?COMMUNITY HEALTH ANALYSISUSING ROLE COMPOSITIONSMATTHEW ROWE1 AND HARITH ALANI21SCHOOL  OF COMP...
Managing Online Communities 1            Many businesses provide online communities to:                 Increase custome...
The Need for Interpretation 2            Online communities are dynamic behavioural ecosystems                 Users in ...
Outline 3            SAP Community Network            Community Health Indicators            Measuring Role Composition...
SAP Community Network 4            Collection of SAP forums in which users discuss:                 Software development...
Community Health Indicators 5            From the literature there is no single agreed measure of ‘community health’     ...
Measuring Role Compositions I:        Modelling and Measuring User Behaviour 6            According to existing literatur...
Measuring Role Compositions II:        Inferring Roles 7            1. Construct features for community users at a given ...
e as a parameter k. To judge the best model - i.e. clusterhod and number of clusters - we measure the cohesion andaration ...
Experiment 1: Health Indicator Regression 9            Managing online communities is helped by understanding the        ...
Experiment 1: Health Indicator Regression        Results 10                                  Churn Rate                   ...
Experiment 2: Health Change Detection 11            Can we accurately and effectively detect positive and negative change...
find that for the 412 and 414 central forums we achieve                                                                  po...
Findings and Conclusions 13            No global composition pattern for the entirety of SCN                 Identified ...
14         Questions?         Web: http://www.matthew-rowe.com |http://www.lancs.ac.uk/staff/rowem         Email: m.rowe@l...
Upcoming SlideShare
Loading in …5
×

What makes communities tick? Community health analysis using role compositions

1,109 views

Published on

2012 IEEE International Conference on Social Computing

Published in: Technology, Health & Medicine
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total views
1,109
On SlideShare
0
From Embeds
0
Number of Embeds
11
Actions
Shares
0
Downloads
14
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide
  • Assess three forums in the central cluster252 SAP Business One E-Commerce412
  • For common health composition pattern:Assess three forums in the central cluster and differences in coefficients252 SAP Business One E-Commerce412 Business Planning414 Strategy ManagementDifferences show that no general pattern exists
  • What makes communities tick? Community health analysis using role compositions

    1. 1. WHAT MAKES COMMUNITIESTICK?COMMUNITY HEALTH ANALYSISUSING ROLE COMPOSITIONSMATTHEW ROWE1 AND HARITH ALANI21SCHOOL OF COMPUTING AND COMMUNICATIONS,LANCASTER UNIVERSITY, LANCASTER, UK2KNOWLEDGE MEDIA INSTITUTE, THE OPEN UNIVERSITY,MILTON KEYNES, UK2012 ASE/IEEE INTERNATIONAL CONFERENCE ON SOCIAL COMPUTINGAMSTERDAM, THE NETHERLANDShttp://www.matthew-rowe.com | http://www.lancs.ac.uk/staff/rowemm.rowe@lancaster.ac.uk
    2. 2. Managing Online Communities 1   Many businesses provide online communities to:   Increase customer loyalty   Raise brand awareness   Spread word-of-mouth   Facilitate idea generation   Online communities incur significant investment in terms of:   Money spent on hosting and bandwidth   Time and effort for maintenance   Community managers monitor community ‘health’ to:   Ensure longevity   Enable value generation   However, the notion of ‘health’ is hard to pin downWhat makes Communties Tick? Community Health Analysis using Role Compositions
    3. 3. The Need for Interpretation 2   Online communities are dynamic behavioural ecosystems   Users in communities can be defined by their roles   i.e. Exhibiting similar collective behaviour   Prevalent behaviour can impact upon community members and health   Management of communities is helped by:   Understanding the relation between behaviour and health   How user behaviour changes are associated with health   Encouraging users to modify behaviour, in turn affecting health   e.g. content recommendation to specific users   Predicting health changes   Enables early decision making on community policy   Can we accurately and effectively detect positive and negative changes in community health from its composition of behavioural roles?What makes Communties Tick? Community Health Analysis using Role Compositions
    4. 4. Outline 3   SAP Community Network   Community Health Indicators   Measuring Role Compositions:   Measuring user behaviour   Inferring behaviour roles   Mining behaviour roles   Experiments:   Health Indicator Regression   Health Change Detection   Findings and ConclusionsWhat makes Communties Tick? Community Health Analysis using Role Compositions
    5. 5. SAP Community Network 4   Collection of SAP forums in which users discuss:   Software development   SAP Products   Usage of SAP tools   Points system for awarding best answers   Enables development of user reputation   Provided with a dataset covering 33 communities:   Spanning 2004 - 2011 1400   95,200 threads 1000   421,098 messages Post Count   78,690 were allocated points 600   32,942 users 0 200 2004 2005 2006 2007 2008 2009 2010 2011What makes Communties Tick? Community Health Analysis using Role Compositions
    6. 6. Community Health Indicators 5   From the literature there is no single agreed measure of ‘community health’   Multi-faceted nature: loyalty, participation, activity, social capital   Different communities and platforms look at different indicators   Indicator 1: Churn Rate (loyalty)   The proportion of users who participate in a community for the final time   Indicator 2: User Count (participation)   The number of participating users in the community   Indicator 3: Seeds-to-Non-Seeds Posts Proportion (activity)   The Proportion of seed posts (i.e. thread starters that receive a reply) to non-seeds (i.e. no reply)   Indicator 4: Clustering Coefficient (social capital)   The average of users’ clustering coefficients within the largest strongly connected componentWhat makes Communties Tick? Community Health Analysis using Role Compositions
    7. 7. Measuring Role Compositions I: Modelling and Measuring User Behaviour 6   According to existing literature, user behaviour can be defined using 6 dimensions:   (Hautz et al., 2010), (Nolker and Zhou, 2005), (Zhu et al., 2009), (Zhu et al., 2011)   Focus Dispersion   Measure: Forum entropy of the user   Engagement   Measure: Out-degree proportioned by potential maximal out-degree   Popularity   Measure: In-degree proportioned by potential maximal in-degree   Contribution   Measure: Proportion of thread replies created by the user   Initiation   Measure: Proportion of threads that were initiated by the user   Content Quality   Measure: Average points per post awarded to the userWhat makes Communties Tick? Community Health Analysis using Role Compositions
    8. 8. Measuring Role Compositions II: Inferring Roles 7   1. Construct features for community users at a given time step   2. Derive bins using equal frequency binning   Popularity-low cutoff = 0.5, Initiation-high cutoff = 0.4!   3. Use skeleton rule base to construct rules using bin levels   Popularity = low, Initiation = high -> roleA!   Popularity < 0.5, Initiation > 0.4 -> roleA!   4. Apply rules to infer user roles and community composition   5. Repeat 1-4 for following time stepsWhat makes Communties Tick? Community Health Analysis using Role Compositions
    9. 9. e as a parameter k. To judge the best model - i.e. clusterhod and number of clusters - we measure the cohesion andaration of a given clustering as follows: For each clustering rithm (Ψ) we iteratively increase the number of clusters Measuring Role Compositions III: to use where 2 ≥ k ≥ 30. At each increment of k we rd the silhouette coefficient produced by Ψ, this is defined Mining Roles (Skeleton rule base compilation)a given element (i) in a given cluster as: 8 bi − a i si = (3) max(ai , bi )   1. Select the tuning segmentWhere ai denotes the average distance to all other itemshe same cluster and  i is given by calculating thebehaviour dimensions b 2. Discover correlated averageance with all other items inRemoved Engagement and and Fig. 2. kept Popularityfeature distributions in each of the 11 clusters.   each other distinct cluster Contribution, Boxplots of the (Pearson r > 0.75, p < 0.01) taking the minimum distance. The value of s i ranges Feature distributions are matched against the feature levels derived from equal- frequency binningween −1 and 1 where the Clusterindicates a poor cluster- groups   3. former users into behavioural TABLE II where distinct items are grouped role labels for clusters   4. Derive together and the latter M APPING OF CLUSTER DIMENSIONS TO LEVELS . T HE CLUSTERS ARE cates perfect cluster cohesion and separation. To derive ORDERED FROM LOW PATTERNS TO HIGH PATTERNS TO AID LEGIBILITY. silhouette coefficient (s(Ψ(k)) for the entire clustering 0.04 Cluster Dispersion Initiation Quality Popularity 1 L L L L take the average silhouette coefficient of all items. We 0.6 0.03 0 L M H L 6 L H M M that the best clustering model and number of clusters to Dispersion 10 L H M H 0.4 Initiation 0.02 4 L H H M is K-means with 11 clusters. We found that for smaller 2,5 M H L H 8,9 M H H H 0.2 ter numbers (k = [3, 8]) each clustering algorithm achieves 0.01 7 H H L H 3 H H H H parable performance, however as we begin to increase the 0.00 0.0 ter numbers K-means improves while the two remaining 0 1 2 3 4 5 6 7 8 9 •  1 - Focussed Novice 0 1 2 3 4 5 6 7 8 9 Cluster decision node, we measure the entropy of the dimensions and Cluster rithms produce worse cohesion and separation. •  2,5 - Mixed Novice 0.020 10 •  7 Distributed with their levels across the clusters, we then choose the dimension ) Deriving Role Labels: -Provided Novice the most cohesive 0.015 8 •  3 - Distributed Expert with the largest entropy. This is defined formally as: separated clustering•  of users we then derive role labels 8,9 - Mixed Expert 6 Popularity 0.010 Quality |levels|each cluster. Role label 0derivation first Participant inspecting •  - Focussed Expert involves 4 •  - each cluster and dimension distribution4inFocussed Expert Initiator aligning the H(dim) = − p(level|dim) log p(level|dim) (4) 0.005 2 ibution with a level • mapping (i.e. low, mid, high). This 6 - Knowledgeable Member level 0.000 •  10 - Knowledgeable Sink 0bles the conversion of Communties Tick? Community Health Analysis using Role Compositions What makes continuous dimension ranges into 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 Cluster Cluster rete values which our rule-based approach requires in the eton Rule Base. To perform this alignment we assess the
    10. 10. Experiment 1: Health Indicator Regression 9   Managing online communities is helped by understanding the relation between behaviour and health   Experimental Setup   Induced Linear Regression Models for each Health Indicator and Community   Using a time-series dataset   Dependent variables: 9 roles with composition proportions as values at a given time point   E.g. @ t = k: Mixed Expert = 0.05, Distributed Novice = 0.51, etc.   Independent variable: health indicator (e.g. churn rate) at the same time point   E.g. @ t = k: Churn Rate= 0.21   PCA of each community health indicator model using the model’s coefficients   Look for a common health composition patternWhat makes Communties Tick? Community Health Analysis using Role Compositions
    11. 11. Experiment 1: Health Indicator Regression Results 10 Churn Rate User Count Seeds / Non−seeds Prop Clustering Coefficient 50 100 300 353 353 264 256 419 101 100 200 161 419 265 412 418 419 21056 100 50 413 354 50 412 252 270 414 420 319 198 226 0 101 100 252 197 226 44 470 PC2 PC2 PC2 PC2 319 270210 44 0 414 420 198 470 354 256 265 264 126570 2 226 412 50 197 0 101 319 414 420 21056 470 418 −50 413 56 264 1619798 1413 252 354 161 354 413 197 414 161 256 470 264 210 198 420 319 4425256 0 226 2 270 419 44 101 412 −200 265 56 −100 −200 −150 418 50 353418 353 −200 200 600 −800 −400 0 400 −400 0 200 −600 −200 200 PC1 PC1 PC1 PC1   Common Health Composition Pattern   Churn Rate: Differences for Focussed Expert Participant & Mixed Expert, similarities for Focussed Expert Initiators (decrease in role correlated with increase in churn rate)   User Count: Differences for Focussed Expert Initiators, commonalities for knowledgeable roles   Seeds-to-Non-Seeds: Similar effects for Focussed Expert Initiators and Participants, and Distributed Experts (all decrease in role correlated with increased proportion)   Clustering Coefficient: no common patterns   Idiosyncratic Health Composition Pattern   Divergence patterns between outlier communities   No general pattern exists that describes the relation between roles and healthWhat makes Communties Tick? Community Health Analysis using Role Compositions
    12. 12. Experiment 2: Health Change Detection 11   Can we accurately and effectively detect positive and negative changes in community health from its composition of behavioural roles?   Experimental Setup   Binary classification of indicator change   At t=k+1: predict increase or decrease in health indicator from t=k   Time-ordered dataset:   Features @ t=k+1: 9 roles with composition proportions as values   Class @ t=k+1: positive (if increase from t=k), negative (if decrease)   Divide dataset into 80/20 split maintaining time-ordering   Tested using a logistic regression classifier   Platform-level model   Community-specific model   Evaluated using Matthews Correlation Coefficient (MCC) and Area under the ROC Curve (AUC)What makes Communties Tick? Community Health Analysis using Role Compositions
    13. 13. find that for the 412 and 414 central forums we achieve poorer performance than the baseline for the User Count and Clustering Coefficient. Experiment 2: Health Change Detection TABLE IV P ERFORMANCE OF DETECTING HEALTH CHANGES USING A LOGISTIC Results REGRESSION MODEL INDUCED : ACROSS THE ENTIRE PLATFORM (F IGUR IV( A )), PER - FORUM (F IGURE IV( B )) AND FOR SPECIFIC CENTRAL AND 12 OUTLIER FORUMS (F IGURE IV( C )). I N THIS LATTER CASE WE REPORT TH M ATTHEWS C ORRELATION C OEFFICIENT AND THE F1 SCORE .   Per-forum models outperform platform (a) Platform models for each health indicator Class Churn MCC Prec Recall F1 0.047 0.573 0.630 0.531 0.590 AUC   Demonstrates the need to assess and understand User Count 0.035 0.591 0.646 0.522 0.598 Seeds / Non-seeds 0.078 0.592 0.640 0.566 0.617 communities individually Clustering Coefficient 0.077. 0.591 0.641 0.581 0.647   We also yield good performance for outlier Signif. codes: p-value < 0.001 *** 0.01 ** 0.05 * 0.1 . 1 communities (b) Per-forum   ROC Curves surpass baseline for: Class Churn MCC Prec Recall 0.110** 0.618 0.634 0.619 F1 AUC 0.569 User Count 0.175** 0.652 0.661 0.650 0.589   Churn rate: 20/25 forums Seeds / Non-seeds 0.163* 0.637 0.657 0.639 0.589 Clustering Coefficient 0.089** 0.624 0.642 0.626 0.568   User Count: 20/25 forums Signif. codes: p-value < 0.001 *** 0.01 ** 0.05 * 0.1 .1   Seeds-to-Non-Seeds: 19/25 forums (c) Forum Specific Results. MCC / F1   Clustering Coefficient: 17/25 forums Central Outliers Class 252 412 414 353 419 50 Churn Rate User Count Churn Seeds / Non−seeds 0.564 0.105 / Prop Clustering Coefficient 0.042 / 0.621 0.284 / 0.700 -0.076 / 0.543 0.173 / 0.633 0.092 / 0.58 User Count 0.088 / 0.543 0.580 / 0.903 -0.106 / 0.701 0.279 / 0.648 0.299 / 0.667 0.343 / 0.69 1.0 1.0 1.0 1.0 Seeds / Non-seeds 0.117 / 0.575 0.339 / 0.717 0.189 / 0.744 0.007 / 0.519 0.265 / 0.632 0.400 / 0.81 0.8 0.8 0.8 0.8 Clustering Coefficient 0.057 / 0.536 -0.043 / 0.568 0.353 / 0.727 0.156 / 0.582 0.127 / 0.568 0.282 / 0.64 0.6 0.6 0.6 0.6 TPR TPR TPR TPR 1) Results: Health Danger Detection: Thus far we have 0.4 0.4 0.4 0.4 assessed how well our detection models work in both class 0.2 0.2 0.2 0.2 settings (i.e. increase and 0.2 0.4 0.6 0.8 1.0 We now move to a decrease). 0.0 0.0 0.0 0.0 0.0 0.2 0.4 0.6 0.8 1.0 0.0 0.2 0.4 0.6 0.8 1.0 0.0 0.2 0.4 0.6 0.8 1.0 0.0 FPR FPR scenario in which we wish to FPR FPR detect health dangers, and inWhat makes Communties Tick? Community Health Analysis using Role Compositions warnings to community managers of the doing so provide likely reduction in health of their communities. To do this
    14. 14. Findings and Conclusions 13   No global composition pattern for the entirety of SCN   Identified key differences as to ‘What makes Communities tick’   Decrease in Focussed Experts correlated with an increase in Seeds-to-Non-Seeds   (Marin et al., 2009) found a correlation between increase in Core Users and Network Cohesion   We found a correlation between an increase in Knowledgeable Sinks and Social Capital   Accurate detection of community health change is possible using role composition information   Significantly outperformed baseline models   Per-forum models outperformed platform-level models   Future Work:   Explore co-dependencies between health indicators   Application of our approach over different communities and platforms   E.g. IBM Connections, Boards.ieWhat makes Communties Tick? Community Health Analysis using Role Compositions
    15. 15. 14 Questions? Web: http://www.matthew-rowe.com |http://www.lancs.ac.uk/staff/rowem Email: m.rowe@lancaster.ac.uk Twitter: @mattroweshowWhat makes Communties Tick? Community Health Analysis using Role Compositions

    ×