Decomposing discussion forums using user roles Jeffrey Chan & Conor Hayes Friday seminar 8/20/2010 Presenter: AstaZelenkauskaite
Feature-based profiling Users roles are identified by the features (indicators) to profile user behavior Visualization techniques Downside: used only for small-scale studies Proposed solution: soc net analysis Ego-network analysis and the out-degree distribution
Data Boards.ie – the largest discussion board in Ireland 596 forums 75400 users 244850 threads 4.3 mln posts
Features Initially 50 features, redundant eliminated Structural features (as communication btw users) Unweighted directed graphs From interaction with their neighbors Reciprocity features Persistence features Popularity features Initialization features
Structural features (operationalization) From interaction with their neighbors Reciprocity features % of bi-directional neighbors (represents the % of the neighbors of a user where there is both in and out edges – they have replied to each other). Persistence features The length of the conversations a user typically engages in (mean and sd of the posts per thread). Popularity features Ratio of a users’ in-neighbors (% of in-degree) # of replies % of the posts where there is at least one reply to the user. Initialization features Initiated % of msgsby a user.
User role discovery approach Data cleaning Filtering out low-degree, low posting users User grouping Via number of neighbors
User roles Joining conversationalists the ones who do not initiate but post replies Taciturns Low reciprocity (rarely get involved into two-way communication) Elitists Low % of neighbors w/ two-way communication Supporters Middle range of the statistics of all features Popular participant Do not initiate many threads but get involved with a large percentage of users of a forum Grunts Similar to taciturns, relatively high levels of reciprocity. Ignored Extremely low % posts being replied to (not very popular)
Results: Forum composition Some forums are distinctively different from the others (eg. personal issues) Difference in grouping by conversationalists vstaciturns Some topics determine certain composition
Discussion Is it impossible to assess the ‘success of functioning’ from the composition of the group?