What makes communities tick? Community health analysis using role compositions

WHAT MAKES COMMUNITIES
TICK?
COMMUNITY HEALTH ANALYSIS
USING ROLE COMPOSITIONS

MATTHEW ROWE1 AND HARITH ALANI2
1SCHOOL OF COMPUTING AND COMMUNICATIONS,
LANCASTER UNIVERSITY, LANCASTER, UK
2KNOWLEDGE MEDIA INSTITUTE, THE OPEN UNIVERSITY,

MILTON KEYNES, UK

2012 ASE/IEEE INTERNATIONAL CONFERENCE ON SOCIAL COMPUTING
AMSTERDAM, THE NETHERLANDS

http://www.matthew-rowe.com | http://www.lancs.ac.uk/staff/rowem
m.rowe@lancaster.ac.uk

Managing Online Communities
1

  Many businesses provide online communities to:
  Increase customer loyalty
  Raise brand awareness
  Spread word-of-mouth
  Facilitate idea generation
  Online communities incur significant investment in terms of:
  Money spent on hosting and bandwidth
  Time and effort for maintenance
  Community managers monitor community ‘health’ to:
  Ensure longevity

  Enable value generation

  However, the notion of ‘health’ is hard to pin down

What makes Communties Tick? Community Health Analysis using Role Compositions

The Need for Interpretation
2

  Online communities are dynamic behavioural ecosystems
  Users in communities can be defined by their roles
  i.e. Exhibiting similar collective behaviour
  Prevalent behaviour can impact upon community members and health
  Management of communities is helped by:
  Understanding the relation between behaviour and health
  How user behaviour changes are associated with health
  Encouraging users to modify behaviour, in turn affecting health
  e.g. content recommendation to specific users
  Predicting health changes
  Enables early decision making on community policy

  Can we accurately and effectively detect positive and negative changes in
community health from its composition of behavioural roles?

Outline
3

  SAP Community Network
  Community Health Indicators
  Measuring Role Compositions:
  Measuring user behaviour
  Inferring behaviour roles
  Mining behaviour roles
  Experiments:
  Health Indicator Regression
  Health Change Detection
  Findings and Conclusions


SAP Community Network
4

  Collection of SAP forums in which users discuss:
  Software development
  SAP Products
  Usage of SAP tools
  Points system for awarding best answers
  Enables development of user reputation

  Provided with a dataset covering 33 communities:
  Spanning 2004 - 2011

1400
  95,200 threads

1000
  421,098 messages Post Count

  78,690 were allocated points 600

  32,942 users
0 200

2004 2005 2006 2007 2008 2009 2010 2011


Community Health Indicators
5

  From the literature there is no single agreed measure of ‘community health’
  Multi-faceted nature: loyalty, participation, activity, social capital
  Different communities and platforms look at different indicators

  Indicator 1: Churn Rate (loyalty)
  The proportion of users who participate in a community for the final time
  Indicator 2: User Count (participation)
  The number of participating users in the community
  Indicator 3: Seeds-to-Non-Seeds Posts Proportion (activity)
  The Proportion of seed posts (i.e. thread starters that receive a reply) to non-seeds (i.e. no
reply)
  Indicator 4: Clustering Coefficient (social capital)
  The average of users’ clustering coefficients within the largest strongly connected
component


Measuring Role Compositions I:
Modelling and Measuring User Behaviour
6

  According to existing literature, user behaviour can be defined using 6
dimensions:
  (Hautz et al., 2010), (Nolker and Zhou, 2005), (Zhu et al., 2009), (Zhu et al.,
2011)
  Focus Dispersion
  Measure: Forum entropy of the user
  Engagement
  Measure: Out-degree proportioned by potential maximal out-degree
  Popularity
  Measure: In-degree proportioned by potential maximal in-degree
  Contribution
  Measure: Proportion of thread replies created by the user
  Initiation
  Measure: Proportion of threads that were initiated by the user
  Content Quality
  Measure: Average points per post awarded to the user


Measuring Role Compositions II:
Inferring Roles
7

  1. Construct features for community users at a given time step
  2. Derive bins using equal frequency binning
  Popularity-low cutoff = 0.5, Initiation-high cutoff = 0.4!
  3. Use skeleton rule base to construct rules using bin levels
  Popularity = low, Initiation = high -> roleA!
  Popularity < 0.5, Initiation > 0.4 -> roleA!
  4. Apply rules to infer user roles and community composition
  5. Repeat 1-4 for following time steps


e as a parameter k. To judge the best model - i.e. cluster
hod and number of clusters - we measure the cohesion and
aration of a given clustering as follows: For each clustering
rithm (Ψ) we iteratively increase the number of clusters
Measuring Role Compositions III:
to use where 2 ≥ k ≥ 30. At each increment of k we
rd the silhouette coefficient produced by Ψ, this is defined
Mining Roles (Skeleton rule base compilation)
a given element (i) in a given cluster as:
8 bi − a i
si = (3)
max(ai , bi )
  1. Select the tuning segment
Where ai denotes the average distance to all other items
he same cluster and  i is given by calculating thebehaviour dimensions
b 2. Discover correlated average
ance with all other items inRemoved Engagement and and Fig. 2. kept Popularityfeature distributions in each of the 11 clusters.
 
each other distinct cluster Contribution, Boxplots of the (Pearson r > 0.75, p < 0.01)
taking the minimum distance. The value of s i ranges Feature distributions are matched against the feature levels derived from equal-
frequency binning
ween −1 and 1 where the Clusterindicates a poor cluster- groups
  3. former users into behavioural
TABLE II
where distinct items are grouped role labels for clusters
  4. Derive
together and the latter M APPING OF CLUSTER DIMENSIONS TO LEVELS . T HE CLUSTERS ARE
cates perfect cluster cohesion and separation. To derive ORDERED FROM LOW PATTERNS TO HIGH PATTERNS TO AID LEGIBILITY.
silhouette coefficient (s(Ψ(k)) for the entire clustering
0.04

Cluster Dispersion Initiation Quality Popularity
1 L L L L
take the average silhouette coefficient of all items. We
0.6

0.03

0 L M H L
6 L H M M
that the best clustering model and number of clusters to
Dispersion

10 L H M H
0.4

Initiation

0.02

4 L H H M
is K-means with 11 clusters. We found that for smaller 2,5 M H L H
8,9 M H H H
0.2

ter numbers (k = [3, 8]) each clustering algorithm achieves
0.01

7 H H L H
3 H H H H
parable performance, however as we begin to increase the
0.00
0.0

ter numbers K-means improves while the two remaining
0 1 2 3 4 5 6 7 8 9
•  1 - Focussed Novice
0 1 2 3 4 5 6 7 8 9
Cluster
decision node, we measure the entropy of the dimensions and
Cluster

rithms produce worse cohesion and separation.
•  2,5 - Mixed Novice
0.020
10

•  7 Distributed with their levels across the clusters, we then choose the dimension
) Deriving Role Labels: -Provided Novice the most cohesive
0.015
8

•  3 - Distributed Expert with the largest entropy. This is defined formally as:
separated clustering•  of users we then derive role labels
8,9 - Mixed Expert
6

Popularity

0.010
Quality

|levels|
each cluster. Role label 0derivation first Participant inspecting
•  - Focussed Expert involves
4

•  - each cluster and
dimension distribution4inFocussed Expert Initiator aligning the H(dim) = − p(level|dim) log p(level|dim) (4)
0.005
2

ibution with a level • mapping (i.e. low, mid, high). This
6 - Knowledgeable Member level
0.000

•  10 - Knowledgeable Sink
0

bles the conversion of Communties Tick? Community Health Analysis using Role Compositions
What makes continuous dimension ranges into
0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9
Cluster Cluster

rete values which our rule-based approach requires in the
eton Rule Base. To perform this alignment we assess the

Experiment 1: Health Indicator Regression
9

  Managing online communities is helped by understanding the
relation between behaviour and health

  Experimental Setup
  Induced Linear Regression Models for each Health Indicator and
Community
  Using a time-series dataset
  Dependent variables: 9 roles with composition proportions as values at a given time
point
  E.g. @ t = k: Mixed Expert = 0.05, Distributed Novice = 0.51, etc.
  Independent variable: health indicator (e.g. churn rate) at the same time point
  E.g. @ t = k: Churn Rate= 0.21

  PCA of each community health indicator model using the model’s coefficients
  Look for a common health composition pattern


Experiment 1: Health Indicator Regression
Results
10
Churn Rate User Count Seeds / Non−seeds Prop Clustering Coefficient

50 100
300
353 353 264 256
419 101
100

200
161 419 265
412 418
419 21056

100
50 413
354
50 412
252
270
414
420
319
198
226

0
101

100
252 197
226 44 470
PC2

PC2

PC2

PC2
319
270210 44
0

414
420
198
470
354
256
265 264 126570
2
226
412 50 197

0
101 319
414
420
21056
470 418

−50
413
56 264 1619798
1413
252
354 161 354
413
197 414 161
256 470 264
210
198
420
319
4425256

0
226
2 270 419
44
101
412

−200
265 56

−100
−200

−150
418 50 353418 353

−200 200 600 −800 −400 0 400 −400 0 200 −600 −200 200
PC1 PC1 PC1 PC1

  Common Health Composition Pattern
  Churn Rate: Differences for Focussed Expert Participant & Mixed Expert, similarities for
Focussed Expert Initiators (decrease in role correlated with increase in churn rate)
  User Count: Differences for Focussed Expert Initiators, commonalities for knowledgeable roles
  Seeds-to-Non-Seeds: Similar effects for Focussed Expert Initiators and Participants, and
Distributed Experts (all decrease in role correlated with increased proportion)
  Clustering Coefficient: no common patterns
  Idiosyncratic Health Composition Pattern
  Divergence patterns between outlier communities
  No general pattern exists that describes the relation between roles and health

Experiment 2: Health Change Detection
11

  Can we accurately and effectively detect positive and negative changes in
community health from its composition of behavioural roles?

  Experimental Setup
  Binary classification of indicator change
  At t=k+1: predict increase or decrease in health indicator from t=k
  Time-ordered dataset:
  Features @ t=k+1: 9 roles with composition proportions as values
  Class @ t=k+1: positive (if increase from t=k), negative (if decrease)
  Divide dataset into 80/20 split maintaining time-ordering
  Tested using a logistic regression classifier
  Platform-level model
  Community-specific model
  Evaluated using Matthews Correlation Coefficient (MCC) and Area under the ROC
Curve (AUC)


find that for the 412 and 414 central forums we achieve
poorer performance than the baseline for the User Count and
Clustering Coefficient.
Experiment 2: Health Change Detection TABLE IV
P ERFORMANCE OF DETECTING HEALTH CHANGES USING A LOGISTIC
Results REGRESSION MODEL INDUCED : ACROSS THE ENTIRE PLATFORM (F IGUR
IV( A )), PER - FORUM (F IGURE IV( B )) AND FOR SPECIFIC CENTRAL AND
12 OUTLIER FORUMS (F IGURE IV( C )). I N THIS LATTER CASE WE REPORT TH
M ATTHEWS C ORRELATION C OEFFICIENT AND THE F1 SCORE .
  Per-forum models outperform platform (a) Platform
models for each health indicator Class
Churn
MCC Prec Recall F1
0.047 0.573 0.630 0.531 0.590
AUC

  Demonstrates the need to assess and understand User Count 0.035 0.591 0.646 0.522 0.598
Seeds / Non-seeds 0.078 0.592 0.640 0.566 0.617
communities individually Clustering Coefficient 0.077. 0.591 0.641 0.581 0.647
  We also yield good performance for outlier Signif. codes: p-value < 0.001 *** 0.01 ** 0.05 * 0.1 . 1

communities (b) Per-forum
  ROC Curves surpass baseline for: Class
Churn
MCC Prec Recall
0.110** 0.618 0.634 0.619
F1 AUC
0.569
User Count 0.175** 0.652 0.661 0.650 0.589
  Churn rate: 20/25 forums Seeds / Non-seeds 0.163* 0.637 0.657 0.639 0.589
Clustering Coefficient 0.089** 0.624 0.642 0.626 0.568
  User Count: 20/25 forums Signif. codes: p-value < 0.001 *** 0.01 ** 0.05 * 0.1 .1
  Seeds-to-Non-Seeds: 19/25 forums (c) Forum Specific Results. MCC / F1
  Clustering Coefficient: 17/25 forums Central Outliers
Class 252 412 414 353 419 50
Churn Rate User Count Churn Seeds / Non−seeds 0.564
0.105 / Prop Clustering Coefficient
0.042 / 0.621 0.284 / 0.700 -0.076 / 0.543 0.173 / 0.633 0.092 / 0.58
User Count 0.088 / 0.543 0.580 / 0.903 -0.106 / 0.701 0.279 / 0.648 0.299 / 0.667 0.343 / 0.69
1.0

1.0

1.0

1.0
Seeds / Non-seeds 0.117 / 0.575 0.339 / 0.717 0.189 / 0.744 0.007 / 0.519 0.265 / 0.632 0.400 / 0.81
0.8

0.8

0.8

0.8
Clustering Coefficient 0.057 / 0.536 -0.043 / 0.568 0.353 / 0.727 0.156 / 0.582 0.127 / 0.568 0.282 / 0.64
0.6

0.6

0.6

0.6
TPR

TPR

TPR

TPR
1) Results: Health Danger Detection: Thus far we have
0.4

0.4

0.4

0.4
assessed how well our detection models work in both class
0.2

0.2

0.2

0.2
settings (i.e. increase and 0.2 0.4 0.6 0.8 1.0 We now move to a
decrease).
0.0

0.0

0.0

0.0
0.0 0.2 0.4 0.6 0.8 1.0 0.0 0.2 0.4 0.6 0.8 1.0 0.0 0.2 0.4 0.6 0.8 1.0 0.0
FPR FPR scenario in which we wish to FPR
FPR detect health dangers, and in
What makes Communties Tick? Community Health Analysis using Role Compositions warnings to community managers of the
doing so provide
likely reduction in health of their communities. To do this

Findings and Conclusions
13

  No global composition pattern for the entirety of SCN
  Identified key differences as to ‘What makes Communities tick’
  Decrease in Focussed Experts correlated with an increase in Seeds-to-Non-Seeds
  (Marin et al., 2009) found a correlation between increase in Core Users and
Network Cohesion
  We found a correlation between an increase in Knowledgeable Sinks and Social Capital
  Accurate detection of community health change is possible using role composition
information
  Significantly outperformed baseline models
  Per-forum models outperformed platform-level models
  Future Work:
  Explore co-dependencies between health indicators
  Application of our approach over different communities and platforms
  E.g. IBM Connections, Boards.ie


14

Questions?
Web: http://www.matthew-rowe.com |http://www.lancs.ac.uk/staff/rowem
Email: m.rowe@lancaster.ac.uk
Twitter: @mattroweshow


What makes communities tick? Community health analysis using role compositions

Recommended

Recommended

More Related Content

Similar to What makes communities tick? Community health analysis using role compositions

Similar to What makes communities tick? Community health analysis using role compositions (20)

More from Matthew Rowe

More from Matthew Rowe (20)

Recently uploaded

Recently uploaded (20)

What makes communities tick? Community health analysis using role compositions

Editor's Notes