Anticipating Discussion Activity on Community Forums<br />Matthew Rowe, Sofia Angeletou and HarithAlani<br />Knowledge Med...
Community Content<br />1<br />Anticipating Discussion Activity on Community Forums<br />Online communities are now used to...
Anticipating Discussion Activity on Community Forums<br />2<br />
The Need for Analysis<br />Analysts need to know which piece of content will generate the most activity<br />i.e. the most...
Outline<br />Anticipating Discussion Activity: Approach Overview<br />Identifying Seed Posts<br />Predicting Discussion Ac...
Approach Overview<br />Two-stage approach to predict discussion activity in online communities:<br />1. Identify seed post...
Features<br />For each post, model: a) the author, b) the content and c) the topical concentration of the author<br />F1: ...
Features (2)<br />F3: Focus Features<br />Topic entropy: the concentration of the author across community forums<br />High...
Dataset: Boards.ie<br />Irish community message board that was established in 1998<br />Covers a wide array of topics and ...
1. Identifying Seed Posts<br />Will a given post start a discussion?<br />What are the properties that seed posts exhibit?...
1.a) Model Selection<br />Anticipating Discussion Activity on Community Forums<br />10<br />
1.b) Feature Assessment<br />Anticipating Discussion Activity on Community Forums<br />11<br />
1.b) Feature Assessment<br />Anticipating Discussion Activity on Community Forums<br />12<br />
2. Predicting Discussion Activity<br />What is the level of discussion that a seed post will generate?<br />What features ...
2.a) Model Selection<br />Anticipating Discussion Activity on Community Forums<br />14<br />
2.a) Model Selection<br />Anticipating Discussion Activity on Community Forums<br />15<br />Support Vector Regression<br /...
2.b) Feature Contributions<br />What features correlate with heightened discussion activity?<br />Anticipating Discussion ...
Findings<br />RQ1:Which features are key to stimulating discussions?<br />Having many URLs in a post can negatively impact...
Negative sentiment posts generate more activity</li></ul>Anticipating Discussion Activity on Community Forums<br />17<br />
Conclusions and Future Work<br />The two-stage approach is able to:<br />Identify seed posts to a high degree of accuracy<...
Upcoming SlideShare
Loading in …5
×

Anticipating Discussion Activity on Community Forums

3,138 views

Published on

0 Comments
2 Likes
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total views
3,138
On SlideShare
0
From Embeds
0
Number of Embeds
80
Actions
Shares
0
Downloads
7
Comments
0
Likes
2
Embeds 0
No embeds

No notes for slide
  • 80% to 20% skew towards seeds from non-seeds
  • Content features outperform user featuresContent and focus outperforms other feature combinationsAll feature together works bestDiffers from Twitter analysis – user features were better predictors than content features
  • Trained J48 with all features using the training splitTested it on the held-out 10%Dropped1 feature at a time from the model and classified the test splitLooking for features that have greatest reduction in accuracy
  • Boxplots show:Higher referral counts correlate with non-seedsSpamHigher forum likelihood correlates with seedsUsers who concentrate their discussions within select forums will start a discussion – as they’re known to the communityHigher informativeness correlated with non-seeds
  • Solitary features:User features perform best as the solitary feature sets for Linear regression and SVRFocus features best for Isotonic regressionCombinedContent and focus perform best for Linear Isotonic
  • Smallest SD for content and focus features
  • A user can expect increased discussion activity if he/she hasLow forum entropyHigh forum likelihoodIs negative in his/her posts Uses complex language (wide vocab – i.e. articulate)
  • Anticipating Discussion Activity on Community Forums

    1. 1. Anticipating Discussion Activity on Community Forums<br />Matthew Rowe, Sofia Angeletou and HarithAlani<br />Knowledge Media Institute, The Open University, Milton Keynes, United Kingdom<br />The Third IEEE International Conference on Social Computing. MIT, Boston, USA. 2011<br />
    2. 2. Community Content<br />1<br />Anticipating Discussion Activity on Community Forums<br />Online communities are now used to:<br />Ask questions<br />Post opinions and ideas<br />Discuss events and current issues<br />Content analysis in online communities is attractive for:<br />Market analysis<br />Brand consensus and product opinion<br />Social network analytics in the US is predicted to reach $1 billion by 2014 (Forrester 2009)<br />Masses of data is now being published in online communities:<br />Facebook has more than 60 million status updates per day (Facebook statistics 2010)<br />
    3. 3. Anticipating Discussion Activity on Community Forums<br />2<br />
    4. 4. The Need for Analysis<br />Analysts need to know which piece of content will generate the most activity<br />i.e. the most auspicious or influential<br />Helps focus the attention of human and computerised analysts<br />What to track?<br />Need to understand the effect features (community and content) have on attention to content<br />Enable content creators to shape their content in order to maximise impact<br />E.g. promoters, government policy makers<br />RQ1: Which features are key to stimulating discussions?<br />RQ2: How do these features influence discussion length?<br />Anticipating Discussion Activity on Community Forums<br />3<br />
    5. 5. Outline<br />Anticipating Discussion Activity: Approach Overview<br />Identifying Seed Posts<br />Predicting Discussion Activity<br />Features<br />Dataset<br />Community Message Board: Boards.ie<br />1. Identifying Seed Posts<br />2. Predicting Discussion Activity<br />Findings<br />Conclusions<br />Anticipating Discussion Activity on Community Forums<br />4<br />
    6. 6. Approach Overview<br />Two-stage approach to predict discussion activity in online communities:<br />1. Identify seed posts<br />i.e. Thread starters that yield a reply<br />Will a given post start a discussion?<br />What are the properties that seed posts exhibit?<br />What parameters tend to trigger a discussion?<br />2. Predict discussion activity levels<br />From the identified seed posts<br />What is the level of discussion that a seed post will generate?<br />What features correlate with heightened discussion activity?<br />Anticipating Discussion Activity on Community Forums<br />5<br />
    7. 7. Features<br />For each post, model: a) the author, b) the content and c) the topical concentration of the author<br />F1: User Features<br />In-degree, out-degree: social network properties of the author<br />Post count, age, post rate: participation information of the author<br />F2: Content Features<br />Post length, referral count, time in day: surface features of the post<br />Complexity: cumulative entropy of terms in the post<br />Readability: Gunning Fog index of the post<br />Informativeness: TF-IDF measure of terms within the post<br />Polarity: average sentiment of terms in the post<br />Anticipating Discussion Activity on Community Forums<br />6<br />
    8. 8. Features (2)<br />F3: Focus Features<br />Topic entropy: the concentration of the author across community forums<br />Higher entropy indicates a wider spread of forum activity<br />More random distribution, less concentrated<br />Topic Likelihood: the likelihood that a user posts in a specific forum given his post history<br />Measures the affinity that a user has with a given forum<br />Lower likelihood indicates a user posting on an unfamiliar topic<br />Anticipating Discussion Activity on Community Forums<br />7<br />
    9. 9. Dataset: Boards.ie<br />Irish community message board that was established in 1998<br />Covers a wide array of topics and themes in forums<br />E.g. World of Warcraft, Japanese Culture, Rugby<br />We were provided with the complete dataset spanning 1998-2008 of all posts and forum information<br />Focussed on 2006 due to the scale of entire dataset<br />No explicit social connections exist in the dataset<br />Social network features were built from the reply-to graph<br />6-month window prior to the post date was used to build the user and focus features<br />Anticipating Discussion Activity on Community Forums<br />8<br />
    10. 10. 1. Identifying Seed Posts<br />Will a given post start a discussion?<br />What are the properties that seed posts exhibit?<br />Experiment Setup:<br />Used all thread starter posts from Boards.ie in 2006<br />Training/validation/testing sets using a 70/20/10% random split<br />Binary classification task: Is this a seed post or not?<br />Measures: precision, recall, f-measure, area under ROC curve<br />Performed 2 experiments:<br />a) Model Selection<br />Tested individual feature sets (user, content, focus) and combinations<br />b) Feature Assessment<br />Dropping 1 feature at a time, record reduction in f-measure<br />Anticipating Discussion Activity on Community Forums<br />9<br />
    11. 11. 1.a) Model Selection<br />Anticipating Discussion Activity on Community Forums<br />10<br />
    12. 12. 1.b) Feature Assessment<br />Anticipating Discussion Activity on Community Forums<br />11<br />
    13. 13. 1.b) Feature Assessment<br />Anticipating Discussion Activity on Community Forums<br />12<br />
    14. 14. 2. Predicting Discussion Activity<br />What is the level of discussion that a seed post will generate?<br />What features correlate with heightened discussion activity?<br />Experiment Setup:<br />Train: seed posts in 70% training split<br />Test: seed posts in 20% validation split<br />Measure: Normalised Discounted Cumulative Gain (nDCG)<br />Look at varying rank positions: nDCG@k, k=1,2,5,10,20,50,100<br />Performed 2 experiments<br />a) Model Selection<br />Regression models: Linear, Isotonic, Support Vector Regression<br />Tested individual feature sets (user, content, focus) and combinations<br />b) Feature Contributions<br />Assess the features in the best performing model from a)<br />Anticipating Discussion Activity on Community Forums<br />13<br />
    15. 15. 2.a) Model Selection<br />Anticipating Discussion Activity on Community Forums<br />14<br />
    16. 16. 2.a) Model Selection<br />Anticipating Discussion Activity on Community Forums<br />15<br />Support Vector Regression<br />Isotonic<br />Linear<br />
    17. 17. 2.b) Feature Contributions<br />What features correlate with heightened discussion activity?<br />Anticipating Discussion Activity on Community Forums<br />16<br />
    18. 18. Findings<br />RQ1:Which features are key to stimulating discussions?<br />Having many URLs in a post can negatively impact discussion activity<br />Could associate the post with spam content<br />Seed posts are associated with greater forum likelihood<br />Lower informativeness is associated with seed posts<br />i.e. seeds use language that is familiar to the community<br />RQ2: How do these features influence discussion length?<br />Lower forum entropy = heightened discussion activity<br />Greater complexity = heightened discussion activity<br />i.e. include more diverse language in the post<br /><ul><li>Increased activity can be expected from an increase in forum likelihood coupled with a decrease in forum entropy
    19. 19. Negative sentiment posts generate more activity</li></ul>Anticipating Discussion Activity on Community Forums<br />17<br />
    20. 20. Conclusions and Future Work<br />The two-stage approach is able to:<br />Identify seed posts to a high degree of accuracy<br />F-measure: 0.792<br />Predict discussion activity levels<br />nDCG@1: 0.89 (linear regression model)<br />Content and focus features yield best performing model<br />Average nDCG@k: 0.756<br />Findings inform:<br />Market Analysts to track high activity posts from the outset<br />Content creators to shape content in order to maximise impact<br />Currently applying approach over different platforms:<br />How can we predict activity on a given social web system?<br />How do social web systems differ in generate activity?<br />Anticipating Discussion Activity on Community Forums<br />18<br />
    21. 21. Anticipating Discussion Activity on Community Forums<br />19<br />Questions?<br />Web: http://people.kmi.open.ac.uk/rowe<br />Email: m.c.rowe@open.ac.uk<br />Twitter: @mattroweshow<br />

    ×