Your SlideShare is downloading. ×
From User Needs to Community Health: Mining User Behaviour to Analyse Online Communities
Upcoming SlideShare
Loading in...5
×

Thanks for flagging this SlideShare!

Oops! An error has occurred.

×

Saving this for later?

Get the SlideShare app to save on your phone or tablet. Read anywhere, anytime - even offline.

Text the download link to your phone

Standard text messaging rates apply

From User Needs to Community Health: Mining User Behaviour to Analyse Online Communities

21,294
views

Published on

Invited keynote talk at the 1st Workshop of Quality, Motivation and Coordination of Open Collaboration @ the International Conference on Social Informatics 2013

Invited keynote talk at the 1st Workshop of Quality, Motivation and Coordination of Open Collaboration @ the International Conference on Social Informatics 2013

Published in: Technology, Education

0 Comments
3 Likes
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total Views
21,294
On Slideshare
0
From Embeds
0
Number of Embeds
1
Actions
Shares
0
Downloads
9
Comments
0
Likes
3
Embeds 0
No embeds

Report content
Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
No notes for slide

Transcript

  • 1. FROM USER NEEDS TO COMMUNITY HEALTH: MINING USER BEHAVIOUR TO ANALYSE ONLINE COMMUNITIES DR. MATTHEW ROWE SCHOOL OF COMPUTING AND COMMUNICATIONS @MROWEBOT | M.ROWE@LANCASTER.AC.UK Invited Talk @ 1st Workshop on Quality, Motivation and Coordination, International Conference on Social Informatics 2013. Kyoto, Japan
  • 2. About Me 1 2002-2006: M.Eng Software Engineering 2006-2010: Ph.D. Computer Science 2010-2012: Postdoc Research Associate 2012-now: Lecturer in Social Computing Undergrad Postgrad Postdoc Lecturing Time From User Needs to Community Health: Mining User Behaviour to Analyse Online Communities
  • 3. Research Interests 2 Semantics Social networks Digital Identity Data Forecasting + Classification Data Mining Disambiguation Automating Processes Modelling Social Systems Artificial Intelligence Machines Prediction http://scholar.google.com/citations?user=rhyR4_kAAAAJ From User Needs to Community Health: Mining User Behaviour to Analyse Online Communities
  • 4. Collaborators 3 Harith Alani. Senior Lecturer, Knowledge Media Institute, The Open University, UK. http://people.kmi.open.ac.uk/harith/ Miriam Fernandez. Research Associate, Knowledge Media Institute, The Open University, UK. http://kmi.open.ac.uk/people/member/miriamfernandez Conor Hayes. Senior Research Fellow, Digital Enterprise Research Institute, Galway, Ireland. http://www.deri.ie/users/conor-hayes Marcel Karnstedt. Senior Postdoctoral Researcher, Digital Enterprise Research Institute, Galway, Ireland. http://www.marcel.karnstedt.com/ From User Needs to Community Health: Mining User Behaviour to Analyse Online Communities
  • 5. Outline 4 ¨  Part I: Online Communities and User Behaviour ¤  define: online communities, user behaviour! ¤  The ¨  potential for examining user behaviour Part II: Comparing User Behaviour and User Needs ¤  Collecting users’ needs in online communities ¤  Linking needs to behaviour ¨  Part III: Predicting Community Health from User Behaviour ¤  Mining roles from user behaviour ¤  Community health forecasting from collective behaviour From User Needs to Community Health: Mining User Behaviour to Analyse Online Communities
  • 6. 5 Part I: Online Communities and User Behaviour From User Needs to Community Health: Mining User Behaviour to Analyse Online Communities
  • 7. Defining Online Communities 6 a)  Distinct user containers in which users discuss a given topic ¤  E.g. message board forums ¤  E.g. question-answering systems b)  Latent grouping of users by some common attribute ¤  E.g. semantic web community ¤  E.g. social network clusters with high social homophily ¨  This talk focuses on: a) User containers From User Needs to Community Health: Mining User Behaviour to Analyse Online Communities
  • 8. BT (British telecommunications firm) use online communities to enable consumers to provide support to other consumers BBC News web site provides comments sections to encourage user engagement with the news Question-answering systems allow communities of ‘knowledgeable’ users to ask questions and provide answers 7 From User Needs to Community Health: Mining User Behaviour to Analyse Online Communities
  • 9. Why Provide Online Communities? 8 Increase Customer Loyalty Understanding Product Issues Facilitating Idea Generation Raising Brand Awareness Spreading through Word of Mouth From User Needs to Community Health: Mining User Behaviour to Analyse Online Communities
  • 10. Managing Online Communities 9 ¨  Online communities incur significant investments: ¤  Hosting n  Cost and bandwidth: (time + money) grows linearly with popularity ¤  Community management: n  Settling disputes n  Encouraging engagement within the communities ¨  Common questions arise: ¤  ‘How do I know if my community is healthy?’ ¤  ‘What changes in the community lead to it becoming unhealthy’? From User Needs to Community Health: Mining User Behaviour to Analyse Online Communities
  • 11. How do I know if my community is ‘healthy’?! 10 ¨  Approach 1: Needs Satisfaction ¤  ¤  ¨  Identify users’ needs for the community Analyse users to see if their needs have been met Approach 2: Numerical Health Measures ¤  Determine suitable measures for community health (e.g. churn rate) ¤  Analyse these measures over time to see if the community is remaining healthy, or not From User Needs to Community Health: Mining User Behaviour to Analyse Online Communities
  • 12. Analysing User Behaviour 11 ¨  Online communities are behavioural ecosystems ¤  Prevalent user behaviour can impact the behaviour of other users (Preece. 2000) ‘the way’ ‘tangible measures derived from actions performed by and upon a user’ From User Needs to Community Health: Mining User Behaviour to Analyse Online Communities
  • 13. Behaviour Features User Post Forum ‘tangible measures of actions performed by and upon a user’ Initiation ¨  ¤  The extent to which users begin discussions in a community Contribution ¨  ¤  The extent to which the user is providing content Popularity ¨  ¤  Proportion of the community that responds to the user Engagement ¨  ¤  Proportion of the community that a user responds to Focus Dispersion ¨  ¤  Variance of the user’s interests across topics Quality ¨  ¤  Reception of the user’s content by other users 12 From User Needs to Community Health: Mining User Behaviour to Analyse Online Communities +1 +1
  • 14. 13 Part II: Comparing User Behaviour and User Needs From User Needs to Community Health: Mining User Behaviour to Analyse Online Communities
  • 15. Maslow’s Hierarchy of Needs 14 How does this hierarchy resonate with online community users? From User Needs to Community Health: Mining User Behaviour to Analyse Online Communities
  • 16. User Needs in Online Communities 15 ¨  Users have different needs for participating in an online community: ¤  To create content and share information ¤  To communicate with other users ¤  To ask questions ¤  To collaborate with other users ¤  To help other users resolve problems and issues ¤  To discuss ideas ¨  We wanted to find out how important the above needs were to community users… From User Needs to Community Health: Mining User Behaviour to Analyse Online Communities
  • 17. Dataset 1: ¨  Enterprise social software suite ¤  Communities ¨  within enterprises Anonymised dataset (Jan 2010 -> April 2011) ¤  #Communities of Practice (CoP): 100 ¤  #Team Communities (Team): 72 ¤  #Technical Support (Tech): 14 ¨  Labels provided by (Muller et al. 2012) 16 From User Needs to Community Health: Mining User Behaviour to Analyse Online Communities
  • 18. 17 Understanding Users’ Needs on IBM Connections ¨  Surveyed 186 users about their needs ¤  Spanning the aforementioned typed communities ¤  150 responses Likert scale (1-5) for agreement with statements ¨  Examples included: ¨  ¤  How often do you do the following? n  Browse ¤  Rate for information, Search for information, etc. how important the community features are to you? n  Receiving recommendations, ability to filter information, etc. From User Needs to Community Health: Mining User Behaviour to Analyse Online Communities
  • 19. Users Needs on IBM Connections 18 Ranked Community Features: D3.1: Report on Social, Technical and Corporate Needs in Online Communities. M Rowe, H Alani, S Angeletou and G Burel. ROBUST Deliverable 3.1. (2012) From User Needs to Community Health: Mining User Behaviour to Analyse Online Communities
  • 20. Users Needs on IBM Connections 19 D3.1: Report on Social, Technical and Corporate Needs in Online Communities. M Rowe, H Alani, S Angeletou and G Burel. ROBUST Deliverable 3.1. (2012) From User Needs to Community Health: Mining User Behaviour to Analyse Online Communities
  • 21. differtypes 5. We nd the vel of n. As mmuarticbution sis to w that lving pes in n and e. For alue t type in the other communities. Popularity is higher in Team and Tech communities, but not significantly, than in CoP, suggesting that although users of the latter community provide more contributions, it is with content published by fewer users. For Engagement the mean is significantly highest - at < 0.001 - for Team indicating that users tend to participate ¨  Measured the behaviour of users across the three with more users in these communities than the others. User Behaviour on IBM Connections 20 IBM Connections community types Table 2. Mean and Standard Deviation (in parentheses) of the distribuStandard deviation Mean of of micro features within the different community types tion the behaviour feature Feature CoP Team Tech Focus Dis’ 1.682 (1.680) 1.391 (1.581) 1.382 (1.534) Initiation 7.788 (21.525) 13.235 (23.361) 3.088 (6.676) Contribution 26.084 (77.607) 21.130 (72.298) 11.753 (17.182) Popularity 1.660 (3.647) 2.302 (2.900) 2.286 (3.920) Engagement 1.016 (1.556) 1.948 (2.324) 1.036 ( 1.575) We induce an empirical cumulative distribution function (ECDF) for each across different types of Enterprise Online Communities. M Rowe, M Behaviour analysis micro feature within each community and then qualitatively analyse Hayes and curves of the functions the Web Science Fernandez, H Alani, I Ronen, C how the M Karnstedt. In the proceedings ofdiffer across Conference. to Community Health: Mining User Behaviour in the case of Figure 3 we see communities. For From User NeedsEvanston, US. (2012) instance, to Analyse Online Communities that for Focus Dispersion Tech communities have the high-
  • 22. 5 6 1.0 0.4 CDF(x) 0.2 300 400 1.0 0.2 0.8 cop team tech 7 100 Focus Dispersion 150 200 50 100 150 0.4 200 250 cop team tech Contribution 0.0 0.6 CDF(x) 0.4 0.8 250 Contribution cop team tech 1.0 0 0.6 CDF(x) 0.0 50 0 From User Needs to Community Health: Mining User Behaviour to Analyse Online Communities 0.4 200 2 4 6 Engagement 8 10 0.0 0.0 0.6 0.4 CDF(x) 100 0.4 0 0.2 4 7 0.8 3 6 0.6 CDF(x) 0.6 0.8 1.0 0.8 0.6 1.0 0.8 1.0 5 0 Initiation 1.0 0.8 2 1.0 4 cop Focus Dispersion team tech 1.0 0.0 1 3 0.2 2 0.0 0.2 CDF(x) cop team tech 0.0 1 cop team tech 0.6 0.4 0.8 CDF(x) 0.8 0.6 0.4 0.2 0 0 0.8 CDF(x) 0.6 1.0 0.2 1.0 0.8 21 0.4 CDF(x) User Behaviour on IBM Connections 0
  • 23. Linking Users Needs to User Behaviour 22 Questionnaire questions related to different behaviour aspects (initiation, contribution, etc.) ¨  Mapped questions to these aspects: ¨  ¤  E.g. Initiation questions included: n  How often do you ask a question? n  How often do you create content? n  How often do you announce work events and news? ¨  Resulted in average likert-scale value response per behaviour aspect across community types From User Needs to Community Health: Mining User Behaviour to Analyse Online Communities
  • 24. tries. type responses - e.g. taking the mean of the responses for all pre- Initiation questions for the 95 CoPs. The set of results can The mean of the third micro feature, Contribution, is higheach be seen in Table 4. ck of ars to iffertypes . We d the vel of n. As mmuarticution sis to w that lving est for CoPUsers Needs tohigher than the others) inLinking (but not significantly User Behaviour Table 4. Mean andmore initiated content is interacted with than dicating that standard deviation (in parentheses) values of micro23 features obtainedcommunities. Popularity is higher in Team and in the other using the questionnaires for the different community User types Tech Needs from Questionnairesignificantly, than in CoP, sugcommunities, but not Responses: CoP Tech gesting that although users of the Team community provide latter Focus Dis’ 4.019 more contributions, (0.093) 3.055 (0.426) 4.070 by fewer it is with content published (0.070) Initiation 2.483 (0.838) 2.587 (0.838) 2.243 (0.873) users. For Engagement the mean is (1.016) 3.158 (0.945) at Contribution 3.239 (0.926) 3.202 significantly highest < 0.001 Team indicating that users 2.104 (0.173) Popularity - for2.875 (0.070) 3.084 (0.168)tend to participate with more users in these communities than the others. Engagement 2.844 (0.539) 3.027 (0.588) 2.406 (0.522) Table 2. Mean and Standard Deviation (in parentheses) of the distribu- Observed User Behaviour: the different community types tion of micro features within As Table 4 demonstrates, the findings from the analysis highly Feature CoP Team Tech Focus with 1.682 (1.680) 1.391 to be 1.382 (1.534) correlate Dis’ what users expressed(1.581) relevant for each Initiation community type.7.788 (21.525) 13.235 (23.361) 11.753 (17.182) of We(77.607) 21.130 (72.298) 3.088 (6.676) previously found that high levels Contribution 26.084 Initiation and Contribution are discriminative (3.920) of Popularity 1.660 (3.647) 2.302 (2.900) 2.286 factors Engagement 1.016 (1.556) 1.948 (2.324) 1.036 communiTeam and CoP communities with respect to Tech ( 1.575) From User Needs to Community Health: Mining User Behaviour to Analyse Online Communities ties. Additionally, by looking at the behaviour distributions
  • 25. Understanding Needs Satisfaction 24 ¨  Agreement between users’ needs and how users behave ¤  Reflected by the different needs values across the different community types ¨  Limitations of this approach: 1.  Expensive to collect survey responses n  Took around 6 months between questionnaire publication and results compilation n  Required contacting many users 2.  Implicit biases in reporting across community types n  Team communities had the lowest % of responses From User Needs to Community Health: Mining User Behaviour to Analyse Online Communities
  • 26. 25 Part III: Predicting Community Health from User Behaviour From User Needs to Community Health: Mining User Behaviour to Analyse Online Communities
  • 27. Community Health and User Behaviour 26 ¨  Management of communities is helped by: ¤  Understanding how behaviour and health are related n  How user behaviour changes are associated with health ¤  Predicting n  Enables ¨  health changes early decision making on community policy Can we accurately detect changes in community health from the behaviour of its users? From User Needs to Community Health: Mining User Behaviour to Analyse Online Communities
  • 28. Dataset 2: SAP Community Network 27 ¨  Collection of SAP forums in which users discuss: ¤  Software development, SAP Products, Usage of SAP tools Points system for awarding best answers ¨  Provided with a dataset covering 33 communities: ¨  2004 - 2011 ¤  95,200 threads, 421,098 messages, 32,942 users Post Count 0 200 600 1000 1400 ¤  Spanning 2004 2005 2006 2007 2008 2009 2010 2011 From User Needs to Community Health: Mining User Behaviour to Analyse Online Communities
  • 29. User Behaviour Features on SAP 28 ¨  Focus Dispersion ¤  ¨  Engagement ¤  ¨  Measure: Proportion of thread replies created by the user Initiation ¤  ¨  Measure: In-degree proportioned by potential maximal in-degree Contribution ¤  ¨  Measure: Out-degree proportioned by potential maximal out-degree Popularity ¤  ¨  Measure: Forum entropy of the user Measure: Proportion of threads that were initiated by the user Quality ¤  Measure: Average points per post awarded to the user From User Needs to Community Health: Mining User Behaviour to Analyse Online Communities
  • 30. Inferring Roles from User Behaviour 29 ¨  1. Construct features for community users at a given time step ¨  2. Derive bins using equal frequency binning ¤  ¨  Popularity-low cutoff = 0.5, Initiation-high cutoff = 0.4! 3. Use skeleton rule base to construct rules using bin levels ¤  Popularity = low, Initiation = high -> roleA! ¤  Popularity < 0.5, Initiation > 0.4 -> roleA! ¨  4. Apply rules to infer user roles and community composition ¨  5. Repeat 1-4 for following time steps Community Analysis through Semantic Rules and Role Composition Derivation. M Rowe, M Fernandez, S Angeletou and H Alani. In the Journal of Web Semantics (2012) From User Needs to Community Health: Mining User Behaviour to Analyse Online Communities
  • 31. e as a parameter k. To judge the best model - i.e. cluster hod and number of clusters - we measure the cohesion and aration of a given clustering as follows: For each clustering rithm (Ψ) we iteratively increase the number of clusters to use where 2 ≥ k ≥ 30. At each increment of k we rd the silhouette coefficient produced by Ψ, this is defined a given element (i) in a given cluster as: Mining Roles (Skeleton rule base compilation) 30 si = ¨  bi − a i max(ai , bi ) (3) 1. Select the tuning segment 0.03 0.0 0.00 0.01 0.02 Initiation 0.4 0.2 Dispersion 0.6 0.04 Where ai denotes the average distance to all other items he same cluster and¨  i is given by calculating thebehaviour dimensions b 2. Discover correlated average ance with all other items inRemoved Engagement and and Fig. 2. kept Popularityfeature distributions in each of the 11 clusters. each other distinct cluster Contribution, Boxplots of the (Pearson r > 0.75, p < 0.01) ¤  taking the minimum distance. The value of s i ranges Feature distributions are matched against the feature levels derived from equalfrequency binning ¨  3. former users into behavioural ween −1 and 1 where the Clusterindicates a poor cluster- groups TABLE II where distinct items are grouped role labels for clusters together and the latter M APPING OF CLUSTER DIMENSIONS TO LEVELS . T HE CLUSTERS ARE ¨  4. Derive ORDERED FROM LOW PATTERNS TO HIGH PATTERNS TO AID LEGIBILITY. cates perfect cluster cohesion and separation. To derive silhouette coefficient (s(Ψ(k)) for the entire clustering Cluster Dispersion Initiation Quality Popularity 1 L L L L take the average silhouette coefficient of all items. We 0 L M H L 6 L H M M that the best clustering model and number of clusters to 10 L H M H 4 L H H M is K-means with 11 clusters. We found that for smaller 2,5 M H L H 8,9 M H H H ter numbers (k = [3, 8]) each clustering algorithm achieves 7 H H L H 3 H H H H parable performance, however as we begin to increase the ter numbers K-means improves while the two remaining •  1 - Focussed Novice decision node, we measure the entropy of the dimensions and •  2,5 - Mixed Novice rithms produce worse cohesion and separation. their levels across the clusters, we then choose the dimension •  7 Distributed with ) Deriving Role Labels: -Provided Novice the most cohesive with the largest entropy. This is defined formally as: •  3 - Distributed Expert separated clustering•  of users we then derive role labels 8,9 - Mixed Expert |levels| each cluster. Role label 0derivation first Participant inspecting •  - Focussed Expert involves H(dim) = − p(level|dim) log p(level|dim) (4) •  - each cluster and dimension distribution4inFocussed Expert Initiator aligning the 6 - Knowledgeable Member level ibution with a level • mapping (i.e. low, mid, high). This •  10 - Knowledgeable Sink bles the conversion of continuous dimension ranges User Behaviour to Analyse Online Communities From User Needs to Community Health: Mining into rete values which our rule-based approach requires in the eton Rule Base. To perform this alignment we assess the 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 Cluster 0 0.010 0.000 2 0.005 4 Quality Popularity 6 8 0.015 10 0.020 Cluster 0 1 2 3 4 5 6 7 8 9 Cluster 0 1 2 3 4 5 6 7 8 9 Cluster
  • 32. Community Health Indicators 31 ¨  From the literature there is no single agreed measure of ‘community health’ ¤  ¨  Indicator 1: Churn Rate (loyalty) ¤  ¨  Number of active contributors Indicator 3: Seeds-to-Non-Seeds Posts Proportion (activity) ¤  ¨  Proportion of users that remain Indicator 2: User Count (participation) ¤  ¨  Emergent dimensions: loyalty, participation, activity, social capital Replied to thread starters to non-replied to Indicator 4: Clustering Coefficient (social capital) ¤  Average of users’ clustering coefficients From User Needs to Community Health: Mining User Behaviour to Analyse Online Communities
  • 33. Experiment 1: Health Indicator Regression 32 ¨  ¨  Community management is helped by understanding the relation between behaviour and health Experimental Setup: ¤  Health n  Independent vars: 9 roles with composition proportions as values @ t n  n  E.g. @ t = k: Mixed Expert = 0.05, Distributed Novice = 0.51, etc. Dependent var: health indicator (e.g. churn rate) @ t n  ¤  PCA n  Indicator Linear Regression Models (per community) E.g. @ t = k: Churn Rate= 0.21 of each community model using the model’s coefficients Look for a common health composition pattern From User Needs to Community Health: Mining User Behaviour to Analyse Online Communities
  • 34. Experiment 1: Health Indicator Regression Results 33 50 100 Clustering Coefficient 264 200 600 PC1 ¨  −800 −400 PC1 0 400 0 50 419 353418 −400 0 197 265 21056 413 354 412 252 270 414 420 319 198 226 470 44 418 161 264 200 PC1 353 −600 −200 PC1 Idiosyncratic Health Composition Patterns ¤  ¨  −100 −200 −200 −200 50 354 161 413 414 470 210 198 420 319 4425256 226 2 270 101 412 265 56 PC2 100 50 197 44 418 101 419 0 101 2 226 412 264 126570 319 414 420 21056 470 1 1619798 413 252 354 256 PC2 418 0 50 252 197 226 319 44 270210 414 420 198 470 354 256 265 101 413 56 264 419 100 161 412 200 419 256 −50 353 PC2 0 PC2 100 353 Seeds / Non−seeds Prop −150 User Count 300 Churn Rate Divergence patterns between outlier communities No general pattern exists that describes the relation between roles and health From User Needs to Community Health: Mining User Behaviour to Analyse Online Communities 200
  • 35. Experiment 2: Health Change Detection 34 ¨  ¨  Can we accurately and effectively detect positive and negative changes in community health from its composition of behavioural roles? Experimental Setup ¤  ¤  ¤  Binary classification of indicator change using logistic regression At t=k+1: predict increase or decrease in health indicator from t=k Time-ordered dataset: n  n  n  ¤  Features @ t=k+1: 9 roles with composition proportions as values Class @ t=k+1: positive (if increase from t=k), negative (if decrease) Divide dataset into 80/20 split maintaining time-ordering Evaluated using Area under the ROC Curve (AUC) From User Needs to Community Health: Mining User Behaviour to Analyse Online Communities
  • 36. Experiment 2: Health Change Detection Results 35 ROC Curves surpass baseline for: 0.2 0.4 0.6 FPR 0.8 1.0 1.0 0.2 0.0 0.0 0.2 0.4 0.6 FPR 0.8 1.0 0.4 0.6 0.8 1.0 TPR 0.2 0.0 0.0 Clustering Coefficient 0.8 0.8 0.6 TPR 0.4 0.8 0.6 0.4 0.0 0.2 TPR Seeds / Non−seeds Prop 1.0 User Count 1.0 Churn Rate 0.2 ¤  0.0 ¤  TPR ¤  Churn rate: 20/25 forums User Count: 20/25 forums Seeds-to-Non-Seeds: 19/25 forums Clustering Coefficient: 17/25 forums 0.6 ¤  0.4 ¨  0.0 0.2 0.4 0.6 FPR 0.8 1.0 0.0 0.2 0.4 0.6 0.8 1.0 FPR What makes Communities Tick? Community Health Analysis using Role Compositions. M Rowe and H Alani. In the proceedings of the Fourth IEEE International Conference on Social Computing. Amsterdam, to Community Health: Mining User Behaviour to Analyse Online Communities From User NeedsThe Netherlands. (2012)
  • 37. 36 To Summarise From User Needs to Community Health: Mining User Behaviour to Analyse Online Communities
  • 38. Findings 37 ¨  User Behaviour is closely aligned with users’ needs ¤  Although ¨  this is expensive to collect and analyse Accurate predictions of community health from behaviour ¤  Inferring roles from collective behaviour ¤  Forecasting from role compositions ¨  Community Managers can understand how their community will develop from user behaviour ¤  Requires model tuning per-community Community Analysis through Semantic Rules and Role Composition Derivation. M Rowe, M Fernandez, S Angeletou and H Alani. In the Journal of Web Semantics (2012) From User Needs to Community Health: Mining User Behaviour to Analyse Online Communities
  • 39. Current/Future Work: Lifecycles 38 ¨  Limitation of role-composition approach is the use of platform-wide windowing: ¤  Lack ¨  of high-fidelity behaviour inspection per-user Lifecycles periods: user-specific stages of development First Post 1 2 1 #posts 3 2 = … Last Post n Divide lifetime into equal activity periods #posts From User Needs to Community Health: Mining User Behaviour to Analyse Online Communities
  • 40. users fo by people who have contacted them before and that fewer tently pe novel users appear. The same is also true for the out-degree We find distributions: users contact fewer new people than they did where d before. This is symptomatic of community platforms where the latte despite new users arriving within the platform, users form demonst sub-communities in which they interact and communicate SAP we ¨  Capture period-specific user properties (in period s): with In-degreeindividuals. Figure 2(c) also demonstrates that the same distribution initially ¤  usersOut-degree distribution over time and thus produce a s while fo tend to reuse language ¤  gradually decaying cross-entropy curve. cross-en ¤  Term distribution suggesti to diverg Facebook SAP This effe Server Fault Enabling: Churn prediction, stage-based recommendation whe [2] begin w 1.2 0.30 G G G G G G G 0 G GGGGGGGGGGGGGGG 0.2 0.5 0.8 Lifecycle Stages 1 0.00 0.00 GG 0 G G GG GG GGG GGG GG G GG 0.2 0.5 0.8 Lifecycle Stages 1 GGG GGGGGG GGGGGG 0.0 Cross Entropy 0.05 0.10 Cross Entropy 0.10 0.20 G Cross Entropy 0.4 0.8 39 0.15 User Development 0 0.2 0.5 0.8 Lifecycle Stages 1 Mining User Lifecycles from Online Community Platforms and their Application to Churn (a) In-degree (b) Out-degree (c) Lexical Prediction. M Rowe. To appear in the proceedings of the International Conference on Data Mining. Dallas, US. (2013) From User Needs to Community Health: Mining User Behaviour to Analyse Online Communities Figure 2. Cross-entropies derived from comparing users’ in-degree, out- Inspec concentr platform
  • 41. 40 Questions? @mrowebot m.rowe@lancaster.ac.uk http://www.lancaster.ac.uk/staff/rowem/ From User Needs to Community Health: Mining User Behaviour to Analyse Online Communities