• Share
  • Email
  • Embed
  • Like
  • Save
  • Private Content
Ignorance isn't Bliss: An Empirical Analysis of Attention Patterns in Online Communities
 

Ignorance isn't Bliss: An Empirical Analysis of Attention Patterns in Online Communities

on

  • 574 views

presented at the ASE/IEEE International conference on Social Computing 2012 in Amsterdam

presented at the ASE/IEEE International conference on Social Computing 2012 in Amsterdam

Statistics

Views

Total Views
574
Views on SlideShare
574
Embed Views
0

Actions

Likes
0
Downloads
1
Comments
0

0 Embeds 0

No embeds

Accessibility

Categories

Upload Details

Uploaded via as Microsoft PowerPoint

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment
  • We randomly selected 20 forums that did not have low activity levels. One can see that the set of forums which we selected is very diverse and includes communities around very specific topics such as Golf or Astronomy & Space and communities around Geographical locations such as Ripp of Ireland, and communities around very general topics such as Work&Jobs
  • Since we were interested in exploring different factors we had to develop feature groups which represent the factors which may impact users‘ reply behavior.We created 5 different groups of features which try to explain factor-groups which may potentially impact users‘ communication behavior in certain community ofurms. For example if user features are important in a forum for predicting which posts will get replied than that means that in this forum ist more important who says sth rather than what is said. That means disucssion would be driven by social factors rather than topical factors.On the other hand if content features are most important in a forum than that means that posts need to show certain content characteristics in order to get replies.Focus Features are somehow also user features but describe the topical and forum focus of a user. For some forums it might be necessary that a user has a strong topical focus (i.e. is likely to be an expert) in order to stimulate discussions while in other forums novices might be more likely to get replies.Community features describe relations between a post or its author and the community – e.g. a post might only get replies if it fits ths interests of the community or a user might be more likely to get replied if he has contributed to the community a lot (inequity theaory).
  • Since we were interested in exploring different factors we had to develop feature groups which represent the factors which may impact users‘ reply behavior.We created 5 different groups of features which try to explain factor-groups which may potentially impact users‘ communication behavior in certain community ofurms. For example if user features are important in a forum for predicting which posts will get replied than that means that in this forum ist more important who says sth rather than what is said. That means disucssion would be driven by social factors rather than topical factors.On the other hand if content features are most important in a forum than that means that posts need to show certain content characteristics in order to get replies.Focus Features are somehow also user features but describe the topical and forum focus of a user. For some forums it might be necessary that a user has a strong topical focus (i.e. is likely to be an expert) in order to stimulate discussions while in other forums novices might be more likely to get replies.Community features describe relations between a post or its author and the community – e.g. a post might only get replies if it fits ths interests of the community or a user might be more likely to get replied if he has contributed to the community a lot (inequity theaory).
  • Wecomputedthosefeaturesforeverythreadstarterpublished in 2006 postbyusing a 6 monthwindowprevioustowhenthepost was published.
  • MCC is a balanced measure of the quality of binary classification and can be used even if the classes are of very different sizes.The MCC measure returns a value between -1 and +1 : 0 is no better than random prediction. The F1 score is frequently used by the IR community, while the MCC is used by ML people.
  • For 11 forums our classifier did not outperform (but only matched) the performance of the baseline. We assume that thishappens because most of these 11 forums are rather inactive forums. Another potential explanation is that the discussion behaviour of these communities is in part rather random and/or driven by other, external factors which we could not take into account in our study. For example the discussion behaviour of the communities around specificlocations or regions might for example be impacted by spatial properties of users while the discussion behaviour of the community around forum Television seems to be mainly driven by external events (e.g. start of a new series).In most cases a combination of all features achieves the highest performance
  • Besidetheoverallclassificationperformancewewere also interested in analyzingtheimpactofindiviualfeatures
  • Whenanalyzingthe individual featureswemade a coupleofinterestingobservations such as
  • NDCG wouldbe 1 ifwepredictthe realrankingpostionof a post. The measurepenalizeselementsthatappearlower down altoughtheyshouldbehigherup.
  • Best resultsforSpanishforum.Worstresultsfor 544 (Banking & Insurance & Pensions)
  • This indicates that it is important that a post’s content has certain characteristics (e.g. contains only few links) and fits the topical interests of the community in order to start a discussion.But afterwards it is important that the author of a post has certain topical and/or forum focus in order to stimulate a lengthy discussion in this forum.
  • This indicates that for starting lengthy discussions in this forum it is important that the author of a post has topical and/or forum focus.
  • This indicates that that in this forum posts which fit to the topical interests of the community have the potential to start lengthy discussions.
  • Tosummerizeoursecondexperimentshowsthat
  • So letmestartconcludingmytalk. Whatwelearnedfromourempiricalstudy was that...

Ignorance isn't Bliss: An Empirical Analysis of Attention Patterns in Online Communities Ignorance isn't Bliss: An Empirical Analysis of Attention Patterns in Online Communities Presentation Transcript

  • Ignorance isnt Bliss: An Empirical Analysis of Attention Patterns in Online CommunitiesClaudia Wagner, Matthew Rowe, Markus Strohmaier and Harith Alani Amsterdam, 16.4.2012
  • with…Matthew Rowe Markus Strohmaier Harith Alani
  • 3 MotivationWhich factors impact how much attention a post gets?We use the number of replies as a proxy measurment of attention
  • Research QuestionsWhich factors impact the attention level a postgets in certain community forums?How do these factors differ between individualcommunity forums?
  • 5 Methodology Empirical study of attention patterns in 20 randomly selected forums Two-stage approach Differentiate between threadstarter posts that got at least one reply (seed posts) and threadstarter posts which got no replies at all (non-seed posts) Predict the level of attention that seed posts will generate - i.e. the number of replies
  • DatasetMost popular Irish Message Boards, Boards.ie725 ForumsYear 2005 and 2006
  • 7
  • Feature EngineeringAim Identify the features that impact upon seeding a discussion Identify features associated with seed posts that generate the most attentionFive Feature Groups
  • Five Feature GroupsUser Features user account age, post count, in-degree, out-degree, post rateContent Features post length, complexity, readability, link count, time in day, informativeness, polarityTitle Features Length, question marks, linguistic dimensions (LIWC)Focus Features Forum entropy, forum likelihood, topic entropy, topic likelihood, topic distanceCommunity Features Topical community fit, topical community distance, evolution score, inequity score
  • Feature ComputationFor each threadstarter post published in one of the20 randomly selected forums in 2006 wecomputed our 28 features m1 6 month Fit LDA model with standard parameter T=50, beta=0.01, alpha=50/T
  • Seed Post Identification11 Experiment Identify Posts which got replies (Binary Classification Task) Split data of each forum into train and test data (80/20) Train a logistic regression classifier with each feature group in isolation and all features combined Compare performance by using F1 score and the Matthews correlation coefficient (MCC)
  • Seed Post Identification12 ResultsFor these 9 forums our classifiers outperforms the random baseline:Astronomy & Space: a classifier trained with content features aloneperforms bestSpanish: a classifier trained with title features alone performs best
  • Seed Post Identification13 Feature Impact Analyze impact of individual features rather than groups Interpret statistically significant coefficients of the best performing feature group learned by the logistic regression model Rank the features of the best performing feature group using the Information Gain Ratio (IGR) as a ranking criterion
  • Seed Post Identification14 Observations In Spanish community the title length is the most important features (IGR=0.558, coef=-0.326) Posts with long titles are less likely to get replies In the Bank & Insurance forum short but complex posts which are authored by newbies are most likely to get replies Content length coef=-0.017, p< 0.05 Topic distance coef=2.890, p<0.01 Complexity has highest IGR (IGR=0.354)
  • Seed Post Identification15 Observations Number of links has a negative impact in forum Work & Jobs and Golf, but a positive impact in the Astronomy & Space forum Purpose of community Links have a positive impact in content and information driven communities Links have a negative impact in other communities
  • Seed Post Identification16 Observations Some communities require posts to fit to the topics they usually discuss (e.g., Golf) while others are more open to diverse topics (e.g., Work & Jobs) Specificity of community’s subject Subject of Work &Jobs forum is very general  high topical community distance has a positive impact Subject of Golf forum is very specific  high community distance has a negative impact
  • Activity Level Prediction17 Experiment Identify the features that were correlated with lengthy discussions Rank posts according to their attention level Evaluate our predicted rank using normalized Discounted Cumulative Gain (nDCG) at varying rank positions i.e. top-k where k={1, 5, 10, 20, 50, 100} nDCG = DCG of the predicted ranking divided by DCG the actual rank
  • Activity Level Prediction18 Results Aver AVERAGED NORMALISED DISCOUNTED CUMULATIVE GAIN A value of 1 indicates that the predicted ranking of posts perfectly matched their real ranking.
  • Activity Level Prediction19 Results AverFor the Astronomy & Space community content features were bestfor identifying seed posts and are also best for ranking postsaccording to the attention level they will generate.
  • Activity Level Prediction20 Results AverGolf forum (343)Combination of all features worked best for identifying seed posts.Focus features alone are best for ranking posts.
  • Activity Level Prediction21 Results AverBank & Insurance forum (544)Combination of all features worked best for identifying seed posts.Community features alone are best for ranking posts.
  • Activity Level Prediction22 Summary Factors that impact discussion initiation often differ from the factors that impact discussion length e.g. for the Golf community Seed Posts = all features Activity level = focus features
  • Activity Level Prediction23 Summary Factors that are associated with lengthy discussion tend to be different for different communities The title length is the only feature which has a slightly significant positive impact across several communities on the number of replies a post gets Work & Jobs forum title length coef=0.034 and p<0.01 Satellite forum titles length coef =0.030 and p<0.05
  • 24 Conclusions (1) Different community forums exhibit interesting differences in terms of how attention is generated Most attention patterns which we identified are local and community-specific “Global” patterns may highly depend on composition of dataset
  • 25 Conclusions (2) Same features that have a positive impact on the start of discussions in one community can have a negative impact in another community Example: number of links Negative impact in most communities Positive impact in information and content driven communities
  • 26 Conclusions (3) Purpose of community and specificity of community’s subject may impact their reply behavior Communities which have a supportive purpose are most likely driven by different factors than communities with an informational purpose. Communities around very specific topics require posts to fit to the topical focus. Communities around more general topics do not have this requirement.
  • 27 Limitations & Future Work Correlation versus Causality We cannot answer the „what would have happened if“ question with our approach Controlled experiments where platform is manipulated Most attention patterns are lokal. But how lokal? Can we automatically identify the context in which attention patterns may hold?
  • Attention patterns tend to be local and community-specific. Ignoring communities’ idiosyncrasies isn’t a bliss. Experimental Setup THANK YOU claudia.wagner@joanneum.at http://claudiawagner.infosrc: http://adobeairstream.com/green/a-natural-predicament-sustainability-in-the-21st-century/