Predicting Discussions on the Social Semantic Web<br />Matthew Rowe, Sofia Angeletou and HarithAlani<br />Knowledge Media ...
Mass of Social Data<br />Social content is now published at a staggering rate….<br />Predicting Discussions on the Social ...
Social Data Publication Rates<br />~600 Tweets per second [1]<br />~700 Facebook status updates per second [1]<br />Spinn3...
The New Information Era<br />Predicting Discussions on the Social Semantic Web<br />3<br />
But…. Analysis is Limited<br />Market Analysts<br />What are people saying about my products?<br />Opinion Mining<br />How...
Attention Economics<br />Given all this data…<br />How do we decide on what information <br />to focus on? <br />How do we...
Discussions on Twitter<br />Twitter is used as medium to:<br />Share opinions and ideas<br />Engage in discussions <br />D...
Predicting Discussions<br />Pre-empt discussions on the Social Web:<br />Identifying seed posts <br />i.e. posts that star...
The Need for Semantics<br />For predictions we require statistical features<br />User features<br />Content features<br />...
Behaviour Ontology<br />Predicting Discussions on the Social Semantic Web<br />9<br />www.purl.org/NET/oubo/0.23/ <br />
Features<br />10<br />Predicting Discussions on the Social Semantic Web<br />
Identifying Seed Posts<br />Experiments<br />Haiti and Union Address Datasets<br />Divided each dataset up using 70/20/10 ...
Identifying Seed Posts<br />12<br />Predicting Discussions on the Social Semantic Web<br />
Identifying Seed Posts<br />What are the most important features?<br />13<br />Predicting Discussions on the Social Semant...
Identifying Seed Posts<br />What is the correlation between seed posts and features?<br />14<br />Predicting Discussions o...
Identifying Seed Posts<br />15<br />Can we identify seed posts using the top-k features?<br />Predicting Discussions on th...
Predicting Discussion Activity<br />From identified seed posts:<br />Can we predict the level of discussion activity?<br /...
Predicting Discussion Activity<br />17<br />Predicting Discussions on the Social Semantic Web<br />
Predicting Discussion Activity<br />Compare rankings<br />Ground truth vs predicted<br />Experiments<br />Using Haiti and ...
Predicting Discussion Activity<br />19<br />Predicting Discussions on the Social Semantic Web<br />
Findings<br />User reputation and standing is crucial<br />eliciting a response<br />starting a discussion<br />Greater br...
Conclusions<br />Pre-empt discussions to empower<br />Market analysts<br />Opinion mining<br />eGovernment policy makers<b...
Current and Future Work<br />Experiments over a forum dataset<br />Content features >> user features<br />Different platfo...
Questions<br />Questions?<br />people.kmi.open.ac.uk/rowe<br />m.c.rowe@open.ac.uk<br />@mattroweshow<br />Predicting Disc...
Upcoming SlideShare
Loading in...5
×

Predicting Discussions on the Social Semantic Web

1,735

Published on

Published in: Education, Technology
0 Comments
3 Likes
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total Views
1,735
On Slideshare
0
From Embeds
0
Number of Embeds
0
Actions
Shares
0
Downloads
13
Comments
0
Likes
3
Embeds 0
No embeds

No notes for slide
  • Web Science talk!!!
  • Provision of a range of applicationsLarge-scale uptake of smart phones
  • How can I track discussions?
  • What features are correlated with discussions?
  • User features more important than content featuresSignificant at alpha = 0.01User reputation therefore more important than the quality of content!
  • Over training split!Pearsoncorrelation: 0.674 = Fair agreement
  • Ran experiments using j48 with all features, testing on the test split
  • Fitted Kolmogorov-Smirnov goodness of fit test
  • For haitiContent features more important than user features at higher ranksUser features show greater performance as ranks are increasedUnion:User features outperform content for all ranks apart from topIndicates the differences in the dynamics of the data
  • Attention Economics!!
  • Predicting Discussions on the Social Semantic Web

    1. 1. Predicting Discussions on the Social Semantic Web<br />Matthew Rowe, Sofia Angeletou and HarithAlani<br />Knowledge Media Institute, The Open University, Milton Keynes, United Kingdom<br />
    2. 2. Mass of Social Data<br />Social content is now published at a staggering rate….<br />Predicting Discussions on the Social Semantic Web<br />1<br />
    3. 3. Social Data Publication Rates<br />~600 Tweets per second [1]<br />~700 Facebook status updates per second [1]<br />Spinn3r dataset collected from Jan – Feb 2011 [2]<br />133 million blog posts<br />5.7 million forum posts<br />231 million social media posts<br />[1] http://searchengineland.com/by-the-numbers-twitter-vs-facebook-vs-google-buzz-36709<br />[2] http://icwsm.org/data/index.php<br />Predicting Discussions on the Social Semantic Web<br />2<br />
    4. 4. The New Information Era<br />Predicting Discussions on the Social Semantic Web<br />3<br />
    5. 5. But…. Analysis is Limited<br />Market Analysts<br />What are people saying about my products?<br />Opinion Mining<br />How are people perceiving a given subject or topic?<br />eGovernment Policy Makers<br />How is a policy or law received by the public?<br />How can I maximise feedback to my content?<br />Predicting Discussions on the Social Semantic Web<br />4<br />
    6. 6. Attention Economics<br />Given all this data…<br />How do we decide on what information <br />to focus on? <br />How do we know what posts will <br />evolve into discussions?<br />Attention Economics (Goldhaber, 1997)<br />Need to understand key indicators of high-attention discussions<br />5<br />Predicting Discussions on the Social Semantic Web<br />
    7. 7. Discussions on Twitter<br />Twitter is used as medium to:<br />Share opinions and ideas<br />Engage in discussions <br />Discussing events <br />Debating topics<br />Identifying online discussions enables:<br />Up-to-date public opinion<br />Observation of topics of interest<br />Gauging the popularity of government policies<br />Fine-grained customer support<br />6<br />Predicting Discussions on the Social Semantic Web<br />
    8. 8. Predicting Discussions<br />Pre-empt discussions on the Social Web:<br />Identifying seed posts <br />i.e. posts that start a discussion<br />Will a given post start a discussion?<br />What are the key features of seed posts?<br />Predicting discussion activity levels<br />What is the level of discussion that a seed post will generate?<br />What are the key factors of lengthy discussions?<br />7<br />Predicting Discussions on the Social Semantic Web<br />
    9. 9. The Need for Semantics<br />For predictions we require statistical features<br />User features<br />Content features<br />Features provided using differing schemas by different platforms<br />How to overcome heterogeneity?<br />Currently, no ontologies capture such features<br />Predicting Discussions on the Social Semantic Web<br />8<br />
    10. 10. Behaviour Ontology<br />Predicting Discussions on the Social Semantic Web<br />9<br />www.purl.org/NET/oubo/0.23/ <br />
    11. 11. Features<br />10<br />Predicting Discussions on the Social Semantic Web<br />
    12. 12. Identifying Seed Posts<br />Experiments<br />Haiti and Union Address Datasets<br />Divided each dataset up using 70/20/10 split for training/validation/testing<br />Evaluated a binary classification task<br />Is this post a seed post or not?<br />Precision, Recall, F1 and Area under ROC<br />Tested: user, content, user+content features<br />Tested Perceptron, SVM, Naïve Bayes and J48<br />11<br />Predicting Discussions on the Social Semantic Web<br />
    13. 13. Identifying Seed Posts<br />12<br />Predicting Discussions on the Social Semantic Web<br />
    14. 14. Identifying Seed Posts<br />What are the most important features?<br />13<br />Predicting Discussions on the Social Semantic Web<br />
    15. 15. Identifying Seed Posts<br />What is the correlation between seed posts and features?<br />14<br />Predicting Discussions on the Social Semantic Web<br />
    16. 16. Identifying Seed Posts<br />15<br />Can we identify seed posts using the top-k features?<br />Predicting Discussions on the Social Semantic Web<br />
    17. 17. Predicting Discussion Activity<br />From identified seed posts:<br />Can we predict the level of discussion activity?<br />How much activity will a post generate?<br />[Wang & Groth, 2010] learns a regression model, and reports on coefficients<br />Identifying relationship between features<br />We do something different:<br />Predict the volume of the discussion<br />16<br />Predicting Discussions on the Social Semantic Web<br />
    18. 18. Predicting Discussion Activity<br />17<br />Predicting Discussions on the Social Semantic Web<br />
    19. 19. Predicting Discussion Activity<br />Compare rankings<br />Ground truth vs predicted<br />Experiments<br />Using Haiti and Union Address datasets<br />Evaluation measure: NormalisedDiscounted Cumulative Gain<br />Assessing nDCG@k where k={1,5,10,20,50,100)<br />TestedSupport Vector Regression with: <br />user, content, user+content features<br />18<br />Predicting Discussions on the Social Semantic Web<br />
    20. 20. Predicting Discussion Activity<br />19<br />Predicting Discussions on the Social Semantic Web<br />
    21. 21. Findings<br />User reputation and standing is crucial<br />eliciting a response<br />starting a discussion<br />Greater broadcast capability = greater likelihood of response<br />More listeners = more discussion<br />Activity levels influenced by out-degree<br />Allow the poster to see response from ‘respected’ peers<br />20<br />Predicting Discussions on the Social Semantic Web<br />
    22. 22. Conclusions<br />Pre-empt discussions to empower<br />Market analysts<br />Opinion mining<br />eGovernment policy makers<br />Behaviour ontology<br />Captures impact across platforms<br />Approach accurately predicts:<br />Which posts will yield a reply, and;<br />The level of discussion activity<br />Predicting Discussions on the Social Semantic Web<br />21<br />
    23. 23. Current and Future Work<br />Experiments over a forum dataset<br />Content features >> user features<br />Different platform dynamics<br />Extend experiments to a random Twitter dataset<br />Extension to behaviour ontology<br />Captures concentration<br />i.e. focus of a user on specific topics<br />Categorising users by role<br />Based on observed behaviour<br />Predicting Discussions on the Social Semantic Web<br />22<br />
    24. 24. Questions<br />Questions?<br />people.kmi.open.ac.uk/rowe<br />m.c.rowe@open.ac.uk<br />@mattroweshow<br />Predicting Discussions on the Social Semantic Web<br />23<br />
    1. A particular slide catching your eye?

      Clipping is a handy way to collect important slides you want to go back to later.

    ×