SlideShare a Scribd company logo
Anticipating Discussion Activity on Community Forums Matthew Rowe, Sofia Angeletou and HarithAlani Knowledge Media Institute, The Open University, Milton Keynes, United Kingdom The Third IEEE International Conference on Social Computing. MIT, Boston, USA. 2011
Community Content 1 Anticipating Discussion Activity on Community Forums Online communities are now used to: Ask questions Post opinions and ideas Discuss events and current issues Content analysis in online communities is attractive for: Market analysis Brand consensus and product opinion Social network analytics in the US is predicted to reach       $1 billion by 2014 (Forrester 2009) Masses of data is now being published in online communities: Facebook has more than 60 million status updates per day (Facebook statistics 2010)
Anticipating Discussion Activity on Community Forums 2
The Need for Analysis Analysts need to know which piece of content will generate the most activity i.e. the most auspicious or influential Helps focus the attention of human and computerised analysts What to track? Need to understand the effect features (community and content) have on attention to content Enable content creators to shape their content in order to maximise impact E.g. promoters, government policy makers RQ1: Which features are key to stimulating discussions? RQ2: How do these features influence discussion length? Anticipating Discussion Activity on Community Forums 3
Outline Anticipating Discussion Activity: Approach Overview Identifying Seed Posts Predicting Discussion Activity Features Dataset Community Message Board: Boards.ie 1. Identifying Seed Posts 2. Predicting Discussion Activity Findings Conclusions Anticipating Discussion Activity on Community Forums 4
Approach Overview Two-stage approach to predict discussion activity in online communities: 1. Identify seed posts i.e. Thread starters that yield a reply Will a given post start a discussion? What are the properties that seed posts exhibit? What parameters tend to trigger a discussion? 2. Predict discussion activity levels From the identified seed posts What is the level of discussion that a seed post will generate? What features correlate with heightened discussion activity? Anticipating Discussion Activity on Community Forums 5
Features For each post, model: a) the author, b) the content and c) the topical concentration of the author F1: User Features In-degree, out-degree: social network properties of the author Post count, age, post rate: participation information of the author F2: Content Features Post length, referral count, time in day: surface features of the post Complexity: cumulative entropy of terms in the post Readability: Gunning Fog index of the post Informativeness: TF-IDF measure of terms within the post Polarity: average sentiment of terms in the post Anticipating Discussion Activity on Community Forums 6
Features (2) F3: Focus Features Topic entropy: the concentration of the author across community forums Higher entropy indicates a wider spread of forum activity More random distribution, less concentrated Topic Likelihood: the likelihood that a user posts in a specific forum given his post history Measures the affinity that a user has with a given forum Lower likelihood indicates a user posting on an unfamiliar topic Anticipating Discussion Activity on Community Forums 7
Dataset: Boards.ie Irish community message board that was established in 1998 Covers a wide array of topics and themes in forums E.g. World of Warcraft, Japanese Culture, Rugby We were provided with the complete dataset spanning 1998-2008 of all posts and forum information Focussed on 2006 due to the scale of entire dataset No explicit social connections exist in the dataset Social network features were built from the reply-to graph 6-month window prior to the post date was used to build the user and focus features Anticipating Discussion Activity on Community Forums 8
1. Identifying Seed Posts Will a given post start a discussion? What are the properties that seed posts exhibit? Experiment Setup: Used all thread starter posts from Boards.ie in 2006 Training/validation/testing sets using a 70/20/10% random split Binary classification task: Is this a seed post or not? Measures: precision, recall, f-measure, area under ROC curve Performed 2 experiments: a) Model Selection Tested individual feature sets (user, content, focus) and combinations b) Feature Assessment Dropping 1 feature at a time, record reduction in f-measure Anticipating Discussion Activity on Community Forums 9
1.a) Model Selection Anticipating Discussion Activity on Community Forums 10
1.b) Feature Assessment Anticipating Discussion Activity on Community Forums 11
1.b) Feature Assessment Anticipating Discussion Activity on Community Forums 12
2. Predicting Discussion Activity What is the level of discussion that a seed post will generate? What features correlate with heightened discussion activity? Experiment Setup: Train: seed posts in 70% training split Test: seed posts in 20% validation split Measure: Normalised Discounted Cumulative Gain (nDCG) Look at varying rank positions: nDCG@k, k=1,2,5,10,20,50,100 Performed 2 experiments a) Model Selection Regression models: Linear, Isotonic, Support Vector Regression Tested individual feature sets (user, content, focus) and combinations b) Feature Contributions Assess the features in the best performing model from a) Anticipating Discussion Activity on Community Forums 13
2.a) Model Selection Anticipating Discussion Activity on Community Forums 14
2.a) Model Selection Anticipating Discussion Activity on Community Forums 15 Support Vector Regression Isotonic Linear
2.b) Feature Contributions What features correlate with heightened discussion activity? Anticipating Discussion Activity on Community Forums 16
Findings RQ1:Which features are key to stimulating discussions? Having many URLs in a post can negatively impact discussion activity Could associate the post with spam content Seed posts are associated with greater forum likelihood Lower informativeness is associated with seed posts i.e. seeds use language that is familiar to the community RQ2: How do these features influence discussion length? Lower forum entropy = heightened discussion activity Greater complexity = heightened discussion activity i.e. include more diverse language in the post ,[object Object]
Negative sentiment posts generate more activityAnticipating Discussion Activity on Community Forums 17
Conclusions and Future Work The two-stage approach is able to: Identify seed posts to a high degree of accuracy F-measure: 0.792 Predict discussion activity levels nDCG@1: 0.89 (linear regression model) Content and focus features yield best performing model Average nDCG@k: 0.756 Findings inform: Market Analysts to track high activity posts from the outset Content creators to shape content in order to maximise impact Currently applying approach over different platforms: How can we predict activity on a given social web system? How do social web systems differ in generate activity? Anticipating Discussion Activity on Community Forums 18

More Related Content

Similar to Anticipating Discussion Activity on Community Forums

Socialcom2011 discussionactivityprediction
Socialcom2011 discussionactivitypredictionSocialcom2011 discussionactivityprediction
Socialcom2011 discussionactivityprediction
WeGov project
 
Predicting Discussions on the Social Semantic Web
Predicting Discussions on the Social Semantic WebPredicting Discussions on the Social Semantic Web
Predicting Discussions on the Social Semantic WebMatthew Rowe
 
Understanding User-Community Engagement by Multi-faceted Features: A Case ...
Understanding User-Community Engagement by Multi-faceted Features: A Case ...Understanding User-Community Engagement by Multi-faceted Features: A Case ...
Understanding User-Community Engagement by Multi-faceted Features: A Case ...
Artificial Intelligence Institute at UofSC
 
WEB 2.0 FOR FORESIGHT: EXPERIENCES ON AN INNOVATION PLATFORM IN EUROPEAN AGEN...
WEB 2.0 FOR FORESIGHT: EXPERIENCES ON AN INNOVATION PLATFORM IN EUROPEAN AGEN...WEB 2.0 FOR FORESIGHT: EXPERIENCES ON AN INNOVATION PLATFORM IN EUROPEAN AGEN...
WEB 2.0 FOR FORESIGHT: EXPERIENCES ON AN INNOVATION PLATFORM IN EUROPEAN AGEN...
Totti Könnölä
 
How are project-specific forums utilized? A study of participation, content, ...
How are project-specific forums utilized? A study of participation, content, ...How are project-specific forums utilized? A study of participation, content, ...
How are project-specific forums utilized? A study of participation, content, ...
Yusuf Sulistyo Nugroho
 
Bootstrap Austin Community
Bootstrap  Austin  CommunityBootstrap  Austin  Community
Bootstrap Austin Community
Bijoy Goswami
 
A data-driven approach for understanding Open Design @ Design For Next
A data-driven approach for understanding Open Design @ Design For NextA data-driven approach for understanding Open Design @ Design For Next
A data-driven approach for understanding Open Design @ Design For Next
MAKE-IT
 
Sample Project Requirements Document – Library Blog
Sample Project Requirements Document – Library BlogSample Project Requirements Document – Library Blog
Sample Project Requirements Document – Library BlogALATechSource
 
Operationalisation of Collaboration Sunbelt 2015
Operationalisation of Collaboration Sunbelt 2015Operationalisation of Collaboration Sunbelt 2015
Operationalisation of Collaboration Sunbelt 2015
Dawn Foster
 
3: web technologies
3: web technologies3: web technologies
3: web technologies
COMP 113
 
Learning about CHAOSS - Ana Jimenez Santamaria, Daniel Izquierdo
Learning about CHAOSS - Ana Jimenez Santamaria, Daniel IzquierdoLearning about CHAOSS - Ana Jimenez Santamaria, Daniel Izquierdo
Learning about CHAOSS - Ana Jimenez Santamaria, Daniel Izquierdo
Ana Jiménez Santamaría
 
tweet segmentation
tweet segmentation tweet segmentation
tweet segmentation
prashanttarone
 
Social Web .20 Class Week 6: Lightweight Authoring, Blogs, Wikis
Social Web .20 Class Week 6: Lightweight Authoring, Blogs, WikisSocial Web .20 Class Week 6: Lightweight Authoring, Blogs, Wikis
Social Web .20 Class Week 6: Lightweight Authoring, Blogs, Wikis
Shelly D. Farnham, Ph.D.
 
The ROLE SRE Approach - Getting more concrete
The ROLE SRE Approach - Getting more concreteThe ROLE SRE Approach - Getting more concrete
The ROLE SRE Approach - Getting more concrete
drenzel
 
CSE509 Lecture 5
CSE509 Lecture 5CSE509 Lecture 5
Presentation by ashutosh mutsaddi
Presentation by ashutosh mutsaddiPresentation by ashutosh mutsaddi
Presentation by ashutosh mutsaddiPMI_IREP_TP
 
Web2.0.2012 - lesson 9 - social networks
Web2.0.2012 - lesson 9 - social networksWeb2.0.2012 - lesson 9 - social networks
Web2.0.2012 - lesson 9 - social networks
Carlo Vaccari
 

Similar to Anticipating Discussion Activity on Community Forums (20)

Socialcom2011 discussionactivityprediction
Socialcom2011 discussionactivitypredictionSocialcom2011 discussionactivityprediction
Socialcom2011 discussionactivityprediction
 
Predicting Discussions on the Social Semantic Web
Predicting Discussions on the Social Semantic WebPredicting Discussions on the Social Semantic Web
Predicting Discussions on the Social Semantic Web
 
Understanding User-Community Engagement by Multi-faceted Features: A Case ...
Understanding User-Community Engagement by Multi-faceted Features: A Case ...Understanding User-Community Engagement by Multi-faceted Features: A Case ...
Understanding User-Community Engagement by Multi-faceted Features: A Case ...
 
212 joy moore ssp 2007
212 joy moore ssp 2007212 joy moore ssp 2007
212 joy moore ssp 2007
 
WEB 2.0 FOR FORESIGHT: EXPERIENCES ON AN INNOVATION PLATFORM IN EUROPEAN AGEN...
WEB 2.0 FOR FORESIGHT: EXPERIENCES ON AN INNOVATION PLATFORM IN EUROPEAN AGEN...WEB 2.0 FOR FORESIGHT: EXPERIENCES ON AN INNOVATION PLATFORM IN EUROPEAN AGEN...
WEB 2.0 FOR FORESIGHT: EXPERIENCES ON AN INNOVATION PLATFORM IN EUROPEAN AGEN...
 
Access report PPT
Access report PPTAccess report PPT
Access report PPT
 
How are project-specific forums utilized? A study of participation, content, ...
How are project-specific forums utilized? A study of participation, content, ...How are project-specific forums utilized? A study of participation, content, ...
How are project-specific forums utilized? A study of participation, content, ...
 
Bootstrap Austin Community
Bootstrap  Austin  CommunityBootstrap  Austin  Community
Bootstrap Austin Community
 
A data-driven approach for understanding Open Design @ Design For Next
A data-driven approach for understanding Open Design @ Design For NextA data-driven approach for understanding Open Design @ Design For Next
A data-driven approach for understanding Open Design @ Design For Next
 
Sample Project Requirements Document – Library Blog
Sample Project Requirements Document – Library BlogSample Project Requirements Document – Library Blog
Sample Project Requirements Document – Library Blog
 
Operationalisation of Collaboration Sunbelt 2015
Operationalisation of Collaboration Sunbelt 2015Operationalisation of Collaboration Sunbelt 2015
Operationalisation of Collaboration Sunbelt 2015
 
3: web technologies
3: web technologies3: web technologies
3: web technologies
 
Learning about CHAOSS - Ana Jimenez Santamaria, Daniel Izquierdo
Learning about CHAOSS - Ana Jimenez Santamaria, Daniel IzquierdoLearning about CHAOSS - Ana Jimenez Santamaria, Daniel Izquierdo
Learning about CHAOSS - Ana Jimenez Santamaria, Daniel Izquierdo
 
tweet segmentation
tweet segmentation tweet segmentation
tweet segmentation
 
Social Web .20 Class Week 6: Lightweight Authoring, Blogs, Wikis
Social Web .20 Class Week 6: Lightweight Authoring, Blogs, WikisSocial Web .20 Class Week 6: Lightweight Authoring, Blogs, Wikis
Social Web .20 Class Week 6: Lightweight Authoring, Blogs, Wikis
 
The ROLE SRE Approach - Getting more concrete
The ROLE SRE Approach - Getting more concreteThe ROLE SRE Approach - Getting more concrete
The ROLE SRE Approach - Getting more concrete
 
CSE509 Lecture 5
CSE509 Lecture 5CSE509 Lecture 5
CSE509 Lecture 5
 
Presentation by ashutosh mutsaddi
Presentation by ashutosh mutsaddiPresentation by ashutosh mutsaddi
Presentation by ashutosh mutsaddi
 
IrmaBorst
IrmaBorstIrmaBorst
IrmaBorst
 
Web2.0.2012 - lesson 9 - social networks
Web2.0.2012 - lesson 9 - social networksWeb2.0.2012 - lesson 9 - social networks
Web2.0.2012 - lesson 9 - social networks
 

More from Matthew Rowe

Social Computing Research with Apache Spark
Social Computing Research with Apache SparkSocial Computing Research with Apache Spark
Social Computing Research with Apache Spark
Matthew Rowe
 
Predicting Online Community Churners using Gaussian Sequences
Predicting Online Community Churners using Gaussian SequencesPredicting Online Community Churners using Gaussian Sequences
Predicting Online Community Churners using Gaussian Sequences
Matthew Rowe
 
Transferring Semantic Categories with Vertex Kernels: Recommendations with Se...
Transferring Semantic Categories with Vertex Kernels: Recommendations with Se...Transferring Semantic Categories with Vertex Kernels: Recommendations with Se...
Transferring Semantic Categories with Vertex Kernels: Recommendations with Se...
Matthew Rowe
 
SemanticSVD++: Incorporating Semantic Taste Evolution for Predicting Ratings
SemanticSVD++: Incorporating Semantic Taste Evolution for Predicting RatingsSemanticSVD++: Incorporating Semantic Taste Evolution for Predicting Ratings
SemanticSVD++: Incorporating Semantic Taste Evolution for Predicting Ratings
Matthew Rowe
 
The Semantic Evolution of Online Communities
The Semantic Evolution of Online CommunitiesThe Semantic Evolution of Online Communities
The Semantic Evolution of Online Communities
Matthew Rowe
 
From Mining to Understanding: The Evolution of Social Web Users
From Mining to Understanding: The Evolution of Social Web UsersFrom Mining to Understanding: The Evolution of Social Web Users
From Mining to Understanding: The Evolution of Social Web Users
Matthew Rowe
 
Mining User Lifecycles from Online Community Platforms and their Application ...
Mining User Lifecycles from Online Community Platforms and their Application ...Mining User Lifecycles from Online Community Platforms and their Application ...
Mining User Lifecycles from Online Community Platforms and their Application ...
Matthew Rowe
 
From User Needs to Community Health: Mining User Behaviour to Analyse Online ...
From User Needs to Community Health: Mining User Behaviour to Analyse Online ...From User Needs to Community Health: Mining User Behaviour to Analyse Online ...
From User Needs to Community Health: Mining User Behaviour to Analyse Online ...
Matthew Rowe
 
Changing with Time: Modelling and Detecting User Lifecycle Periods in Online ...
Changing with Time: Modelling and Detecting User Lifecycle Periods in Online ...Changing with Time: Modelling and Detecting User Lifecycle Periods in Online ...
Changing with Time: Modelling and Detecting User Lifecycle Periods in Online ...
Matthew Rowe
 
Identity: Physical, Cyber, Future
Identity: Physical, Cyber, FutureIdentity: Physical, Cyber, Future
Identity: Physical, Cyber, Future
Matthew Rowe
 
Who will follow whom? Exploiting Semantics for Link Prediction in Attention-I...
Who will follow whom? Exploiting Semantics for Link Prediction in Attention-I...Who will follow whom? Exploiting Semantics for Link Prediction in Attention-I...
Who will follow whom? Exploiting Semantics for Link Prediction in Attention-I...
Matthew Rowe
 
Attention Economics in Social Web Systems
Attention Economics in Social Web SystemsAttention Economics in Social Web Systems
Attention Economics in Social Web Systems
Matthew Rowe
 
What makes communities tick? Community health analysis using role compositions
What makes communities tick? Community health analysis using role compositionsWhat makes communities tick? Community health analysis using role compositions
What makes communities tick? Community health analysis using role compositions
Matthew Rowe
 
Existing Research and Future Research Agenda
Existing Research and Future Research AgendaExisting Research and Future Research Agenda
Existing Research and Future Research AgendaMatthew Rowe
 
Tutorial: Social Semantics
Tutorial: Social SemanticsTutorial: Social Semantics
Tutorial: Social Semantics
Matthew Rowe
 
Modelling and Analysis of User Behaviour in Online Communities
Modelling and Analysis of User Behaviour in Online CommunitiesModelling and Analysis of User Behaviour in Online Communities
Modelling and Analysis of User Behaviour in Online CommunitiesMatthew Rowe
 
Using Behaviour Analysis to Detect Cultural Aspects in Social Web Systems
Using Behaviour Analysis to Detect Cultural Aspects in Social Web SystemsUsing Behaviour Analysis to Detect Cultural Aspects in Social Web Systems
Using Behaviour Analysis to Detect Cultural Aspects in Social Web Systems
Matthew Rowe
 
Semantic Technologies: Representing Semantic Data
Semantic Technologies: Representing Semantic DataSemantic Technologies: Representing Semantic Data
Semantic Technologies: Representing Semantic DataMatthew Rowe
 
Forecasting Audience Increase on Youtube
Forecasting Audience Increase on YoutubeForecasting Audience Increase on Youtube
Forecasting Audience Increase on YoutubeMatthew Rowe
 
PhD Viva - Disambiguating Identity Web References using Social Data
PhD Viva - Disambiguating Identity Web References using Social DataPhD Viva - Disambiguating Identity Web References using Social Data
PhD Viva - Disambiguating Identity Web References using Social DataMatthew Rowe
 

More from Matthew Rowe (20)

Social Computing Research with Apache Spark
Social Computing Research with Apache SparkSocial Computing Research with Apache Spark
Social Computing Research with Apache Spark
 
Predicting Online Community Churners using Gaussian Sequences
Predicting Online Community Churners using Gaussian SequencesPredicting Online Community Churners using Gaussian Sequences
Predicting Online Community Churners using Gaussian Sequences
 
Transferring Semantic Categories with Vertex Kernels: Recommendations with Se...
Transferring Semantic Categories with Vertex Kernels: Recommendations with Se...Transferring Semantic Categories with Vertex Kernels: Recommendations with Se...
Transferring Semantic Categories with Vertex Kernels: Recommendations with Se...
 
SemanticSVD++: Incorporating Semantic Taste Evolution for Predicting Ratings
SemanticSVD++: Incorporating Semantic Taste Evolution for Predicting RatingsSemanticSVD++: Incorporating Semantic Taste Evolution for Predicting Ratings
SemanticSVD++: Incorporating Semantic Taste Evolution for Predicting Ratings
 
The Semantic Evolution of Online Communities
The Semantic Evolution of Online CommunitiesThe Semantic Evolution of Online Communities
The Semantic Evolution of Online Communities
 
From Mining to Understanding: The Evolution of Social Web Users
From Mining to Understanding: The Evolution of Social Web UsersFrom Mining to Understanding: The Evolution of Social Web Users
From Mining to Understanding: The Evolution of Social Web Users
 
Mining User Lifecycles from Online Community Platforms and their Application ...
Mining User Lifecycles from Online Community Platforms and their Application ...Mining User Lifecycles from Online Community Platforms and their Application ...
Mining User Lifecycles from Online Community Platforms and their Application ...
 
From User Needs to Community Health: Mining User Behaviour to Analyse Online ...
From User Needs to Community Health: Mining User Behaviour to Analyse Online ...From User Needs to Community Health: Mining User Behaviour to Analyse Online ...
From User Needs to Community Health: Mining User Behaviour to Analyse Online ...
 
Changing with Time: Modelling and Detecting User Lifecycle Periods in Online ...
Changing with Time: Modelling and Detecting User Lifecycle Periods in Online ...Changing with Time: Modelling and Detecting User Lifecycle Periods in Online ...
Changing with Time: Modelling and Detecting User Lifecycle Periods in Online ...
 
Identity: Physical, Cyber, Future
Identity: Physical, Cyber, FutureIdentity: Physical, Cyber, Future
Identity: Physical, Cyber, Future
 
Who will follow whom? Exploiting Semantics for Link Prediction in Attention-I...
Who will follow whom? Exploiting Semantics for Link Prediction in Attention-I...Who will follow whom? Exploiting Semantics for Link Prediction in Attention-I...
Who will follow whom? Exploiting Semantics for Link Prediction in Attention-I...
 
Attention Economics in Social Web Systems
Attention Economics in Social Web SystemsAttention Economics in Social Web Systems
Attention Economics in Social Web Systems
 
What makes communities tick? Community health analysis using role compositions
What makes communities tick? Community health analysis using role compositionsWhat makes communities tick? Community health analysis using role compositions
What makes communities tick? Community health analysis using role compositions
 
Existing Research and Future Research Agenda
Existing Research and Future Research AgendaExisting Research and Future Research Agenda
Existing Research and Future Research Agenda
 
Tutorial: Social Semantics
Tutorial: Social SemanticsTutorial: Social Semantics
Tutorial: Social Semantics
 
Modelling and Analysis of User Behaviour in Online Communities
Modelling and Analysis of User Behaviour in Online CommunitiesModelling and Analysis of User Behaviour in Online Communities
Modelling and Analysis of User Behaviour in Online Communities
 
Using Behaviour Analysis to Detect Cultural Aspects in Social Web Systems
Using Behaviour Analysis to Detect Cultural Aspects in Social Web SystemsUsing Behaviour Analysis to Detect Cultural Aspects in Social Web Systems
Using Behaviour Analysis to Detect Cultural Aspects in Social Web Systems
 
Semantic Technologies: Representing Semantic Data
Semantic Technologies: Representing Semantic DataSemantic Technologies: Representing Semantic Data
Semantic Technologies: Representing Semantic Data
 
Forecasting Audience Increase on Youtube
Forecasting Audience Increase on YoutubeForecasting Audience Increase on Youtube
Forecasting Audience Increase on Youtube
 
PhD Viva - Disambiguating Identity Web References using Social Data
PhD Viva - Disambiguating Identity Web References using Social DataPhD Viva - Disambiguating Identity Web References using Social Data
PhD Viva - Disambiguating Identity Web References using Social Data
 

Recently uploaded

DevOps and Testing slides at DASA Connect
DevOps and Testing slides at DASA ConnectDevOps and Testing slides at DASA Connect
DevOps and Testing slides at DASA Connect
Kari Kakkonen
 
AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...
AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...
AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...
Product School
 
Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...
Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...
Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...
Ramesh Iyer
 
The Art of the Pitch: WordPress Relationships and Sales
The Art of the Pitch: WordPress Relationships and SalesThe Art of the Pitch: WordPress Relationships and Sales
The Art of the Pitch: WordPress Relationships and Sales
Laura Byrne
 
Designing Great Products: The Power of Design and Leadership by Chief Designe...
Designing Great Products: The Power of Design and Leadership by Chief Designe...Designing Great Products: The Power of Design and Leadership by Chief Designe...
Designing Great Products: The Power of Design and Leadership by Chief Designe...
Product School
 
UiPath Test Automation using UiPath Test Suite series, part 4
UiPath Test Automation using UiPath Test Suite series, part 4UiPath Test Automation using UiPath Test Suite series, part 4
UiPath Test Automation using UiPath Test Suite series, part 4
DianaGray10
 
How world-class product teams are winning in the AI era by CEO and Founder, P...
How world-class product teams are winning in the AI era by CEO and Founder, P...How world-class product teams are winning in the AI era by CEO and Founder, P...
How world-class product teams are winning in the AI era by CEO and Founder, P...
Product School
 
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdfFIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
FIDO Alliance
 
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...
Jeffrey Haguewood
 
FIDO Alliance Osaka Seminar: Overview.pdf
FIDO Alliance Osaka Seminar: Overview.pdfFIDO Alliance Osaka Seminar: Overview.pdf
FIDO Alliance Osaka Seminar: Overview.pdf
FIDO Alliance
 
State of ICS and IoT Cyber Threat Landscape Report 2024 preview
State of ICS and IoT Cyber Threat Landscape Report 2024 previewState of ICS and IoT Cyber Threat Landscape Report 2024 preview
State of ICS and IoT Cyber Threat Landscape Report 2024 preview
Prayukth K V
 
"Impact of front-end architecture on development cost", Viktor Turskyi
"Impact of front-end architecture on development cost", Viktor Turskyi"Impact of front-end architecture on development cost", Viktor Turskyi
"Impact of front-end architecture on development cost", Viktor Turskyi
Fwdays
 
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024
Tobias Schneck
 
From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...
From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...
From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...
Product School
 
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
James Anderson
 
Transcript: Selling digital books in 2024: Insights from industry leaders - T...
Transcript: Selling digital books in 2024: Insights from industry leaders - T...Transcript: Selling digital books in 2024: Insights from industry leaders - T...
Transcript: Selling digital books in 2024: Insights from industry leaders - T...
BookNet Canada
 
Assuring Contact Center Experiences for Your Customers With ThousandEyes
Assuring Contact Center Experiences for Your Customers With ThousandEyesAssuring Contact Center Experiences for Your Customers With ThousandEyes
Assuring Contact Center Experiences for Your Customers With ThousandEyes
ThousandEyes
 
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
UiPathCommunity
 
ODC, Data Fabric and Architecture User Group
ODC, Data Fabric and Architecture User GroupODC, Data Fabric and Architecture User Group
ODC, Data Fabric and Architecture User Group
CatarinaPereira64715
 
When stars align: studies in data quality, knowledge graphs, and machine lear...
When stars align: studies in data quality, knowledge graphs, and machine lear...When stars align: studies in data quality, knowledge graphs, and machine lear...
When stars align: studies in data quality, knowledge graphs, and machine lear...
Elena Simperl
 

Recently uploaded (20)

DevOps and Testing slides at DASA Connect
DevOps and Testing slides at DASA ConnectDevOps and Testing slides at DASA Connect
DevOps and Testing slides at DASA Connect
 
AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...
AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...
AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...
 
Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...
Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...
Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...
 
The Art of the Pitch: WordPress Relationships and Sales
The Art of the Pitch: WordPress Relationships and SalesThe Art of the Pitch: WordPress Relationships and Sales
The Art of the Pitch: WordPress Relationships and Sales
 
Designing Great Products: The Power of Design and Leadership by Chief Designe...
Designing Great Products: The Power of Design and Leadership by Chief Designe...Designing Great Products: The Power of Design and Leadership by Chief Designe...
Designing Great Products: The Power of Design and Leadership by Chief Designe...
 
UiPath Test Automation using UiPath Test Suite series, part 4
UiPath Test Automation using UiPath Test Suite series, part 4UiPath Test Automation using UiPath Test Suite series, part 4
UiPath Test Automation using UiPath Test Suite series, part 4
 
How world-class product teams are winning in the AI era by CEO and Founder, P...
How world-class product teams are winning in the AI era by CEO and Founder, P...How world-class product teams are winning in the AI era by CEO and Founder, P...
How world-class product teams are winning in the AI era by CEO and Founder, P...
 
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdfFIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
 
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...
 
FIDO Alliance Osaka Seminar: Overview.pdf
FIDO Alliance Osaka Seminar: Overview.pdfFIDO Alliance Osaka Seminar: Overview.pdf
FIDO Alliance Osaka Seminar: Overview.pdf
 
State of ICS and IoT Cyber Threat Landscape Report 2024 preview
State of ICS and IoT Cyber Threat Landscape Report 2024 previewState of ICS and IoT Cyber Threat Landscape Report 2024 preview
State of ICS and IoT Cyber Threat Landscape Report 2024 preview
 
"Impact of front-end architecture on development cost", Viktor Turskyi
"Impact of front-end architecture on development cost", Viktor Turskyi"Impact of front-end architecture on development cost", Viktor Turskyi
"Impact of front-end architecture on development cost", Viktor Turskyi
 
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024
 
From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...
From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...
From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...
 
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
 
Transcript: Selling digital books in 2024: Insights from industry leaders - T...
Transcript: Selling digital books in 2024: Insights from industry leaders - T...Transcript: Selling digital books in 2024: Insights from industry leaders - T...
Transcript: Selling digital books in 2024: Insights from industry leaders - T...
 
Assuring Contact Center Experiences for Your Customers With ThousandEyes
Assuring Contact Center Experiences for Your Customers With ThousandEyesAssuring Contact Center Experiences for Your Customers With ThousandEyes
Assuring Contact Center Experiences for Your Customers With ThousandEyes
 
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
 
ODC, Data Fabric and Architecture User Group
ODC, Data Fabric and Architecture User GroupODC, Data Fabric and Architecture User Group
ODC, Data Fabric and Architecture User Group
 
When stars align: studies in data quality, knowledge graphs, and machine lear...
When stars align: studies in data quality, knowledge graphs, and machine lear...When stars align: studies in data quality, knowledge graphs, and machine lear...
When stars align: studies in data quality, knowledge graphs, and machine lear...
 

Anticipating Discussion Activity on Community Forums

  • 1. Anticipating Discussion Activity on Community Forums Matthew Rowe, Sofia Angeletou and HarithAlani Knowledge Media Institute, The Open University, Milton Keynes, United Kingdom The Third IEEE International Conference on Social Computing. MIT, Boston, USA. 2011
  • 2. Community Content 1 Anticipating Discussion Activity on Community Forums Online communities are now used to: Ask questions Post opinions and ideas Discuss events and current issues Content analysis in online communities is attractive for: Market analysis Brand consensus and product opinion Social network analytics in the US is predicted to reach $1 billion by 2014 (Forrester 2009) Masses of data is now being published in online communities: Facebook has more than 60 million status updates per day (Facebook statistics 2010)
  • 3. Anticipating Discussion Activity on Community Forums 2
  • 4. The Need for Analysis Analysts need to know which piece of content will generate the most activity i.e. the most auspicious or influential Helps focus the attention of human and computerised analysts What to track? Need to understand the effect features (community and content) have on attention to content Enable content creators to shape their content in order to maximise impact E.g. promoters, government policy makers RQ1: Which features are key to stimulating discussions? RQ2: How do these features influence discussion length? Anticipating Discussion Activity on Community Forums 3
  • 5. Outline Anticipating Discussion Activity: Approach Overview Identifying Seed Posts Predicting Discussion Activity Features Dataset Community Message Board: Boards.ie 1. Identifying Seed Posts 2. Predicting Discussion Activity Findings Conclusions Anticipating Discussion Activity on Community Forums 4
  • 6. Approach Overview Two-stage approach to predict discussion activity in online communities: 1. Identify seed posts i.e. Thread starters that yield a reply Will a given post start a discussion? What are the properties that seed posts exhibit? What parameters tend to trigger a discussion? 2. Predict discussion activity levels From the identified seed posts What is the level of discussion that a seed post will generate? What features correlate with heightened discussion activity? Anticipating Discussion Activity on Community Forums 5
  • 7. Features For each post, model: a) the author, b) the content and c) the topical concentration of the author F1: User Features In-degree, out-degree: social network properties of the author Post count, age, post rate: participation information of the author F2: Content Features Post length, referral count, time in day: surface features of the post Complexity: cumulative entropy of terms in the post Readability: Gunning Fog index of the post Informativeness: TF-IDF measure of terms within the post Polarity: average sentiment of terms in the post Anticipating Discussion Activity on Community Forums 6
  • 8. Features (2) F3: Focus Features Topic entropy: the concentration of the author across community forums Higher entropy indicates a wider spread of forum activity More random distribution, less concentrated Topic Likelihood: the likelihood that a user posts in a specific forum given his post history Measures the affinity that a user has with a given forum Lower likelihood indicates a user posting on an unfamiliar topic Anticipating Discussion Activity on Community Forums 7
  • 9. Dataset: Boards.ie Irish community message board that was established in 1998 Covers a wide array of topics and themes in forums E.g. World of Warcraft, Japanese Culture, Rugby We were provided with the complete dataset spanning 1998-2008 of all posts and forum information Focussed on 2006 due to the scale of entire dataset No explicit social connections exist in the dataset Social network features were built from the reply-to graph 6-month window prior to the post date was used to build the user and focus features Anticipating Discussion Activity on Community Forums 8
  • 10. 1. Identifying Seed Posts Will a given post start a discussion? What are the properties that seed posts exhibit? Experiment Setup: Used all thread starter posts from Boards.ie in 2006 Training/validation/testing sets using a 70/20/10% random split Binary classification task: Is this a seed post or not? Measures: precision, recall, f-measure, area under ROC curve Performed 2 experiments: a) Model Selection Tested individual feature sets (user, content, focus) and combinations b) Feature Assessment Dropping 1 feature at a time, record reduction in f-measure Anticipating Discussion Activity on Community Forums 9
  • 11. 1.a) Model Selection Anticipating Discussion Activity on Community Forums 10
  • 12. 1.b) Feature Assessment Anticipating Discussion Activity on Community Forums 11
  • 13. 1.b) Feature Assessment Anticipating Discussion Activity on Community Forums 12
  • 14. 2. Predicting Discussion Activity What is the level of discussion that a seed post will generate? What features correlate with heightened discussion activity? Experiment Setup: Train: seed posts in 70% training split Test: seed posts in 20% validation split Measure: Normalised Discounted Cumulative Gain (nDCG) Look at varying rank positions: nDCG@k, k=1,2,5,10,20,50,100 Performed 2 experiments a) Model Selection Regression models: Linear, Isotonic, Support Vector Regression Tested individual feature sets (user, content, focus) and combinations b) Feature Contributions Assess the features in the best performing model from a) Anticipating Discussion Activity on Community Forums 13
  • 15. 2.a) Model Selection Anticipating Discussion Activity on Community Forums 14
  • 16. 2.a) Model Selection Anticipating Discussion Activity on Community Forums 15 Support Vector Regression Isotonic Linear
  • 17. 2.b) Feature Contributions What features correlate with heightened discussion activity? Anticipating Discussion Activity on Community Forums 16
  • 18.
  • 19. Negative sentiment posts generate more activityAnticipating Discussion Activity on Community Forums 17
  • 20. Conclusions and Future Work The two-stage approach is able to: Identify seed posts to a high degree of accuracy F-measure: 0.792 Predict discussion activity levels nDCG@1: 0.89 (linear regression model) Content and focus features yield best performing model Average nDCG@k: 0.756 Findings inform: Market Analysts to track high activity posts from the outset Content creators to shape content in order to maximise impact Currently applying approach over different platforms: How can we predict activity on a given social web system? How do social web systems differ in generate activity? Anticipating Discussion Activity on Community Forums 18
  • 21. Anticipating Discussion Activity on Community Forums 19 Questions? Web: http://people.kmi.open.ac.uk/rowe Email: m.c.rowe@open.ac.uk Twitter: @mattroweshow

Editor's Notes

  1. 80% to 20% skew towards seeds from non-seeds
  2. Content features outperform user featuresContent and focus outperforms other feature combinationsAll feature together works bestDiffers from Twitter analysis – user features were better predictors than content features
  3. Trained J48 with all features using the training splitTested it on the held-out 10%Dropped1 feature at a time from the model and classified the test splitLooking for features that have greatest reduction in accuracy
  4. Boxplots show:Higher referral counts correlate with non-seedsSpamHigher forum likelihood correlates with seedsUsers who concentrate their discussions within select forums will start a discussion – as they’re known to the communityHigher informativeness correlated with non-seeds
  5. Solitary features:User features perform best as the solitary feature sets for Linear regression and SVRFocus features best for Isotonic regressionCombinedContent and focus perform best for Linear Isotonic
  6. Smallest SD for content and focus features
  7. A user can expect increased discussion activity if he/she hasLow forum entropyHigh forum likelihoodIs negative in his/her posts Uses complex language (wide vocab – i.e. articulate)