• Like
  • Save
Social Media Analytics with a pinch of semantics
Upcoming SlideShare
Loading in...5
×
 

Social Media Analytics with a pinch of semantics

on

  • 1,218 views

Gave this talk at SSSW'13; The 10th Summer School on Ontology Engineering and the Semantic Web

Gave this talk at SSSW'13; The 10th Summer School on Ontology Engineering and the Semantic Web
7 - 13 July, 2013. Cercedilla, Spain. http://sssw.org/2013/

Statistics

Views

Total Views
1,218
Views on SlideShare
823
Embed Views
395

Actions

Likes
3
Downloads
18
Comments
0

8 Embeds 395

http://sssw.org 380
http://www.google.com 4
https://twitter.com 3
https://www.google.com 3
http://www.sssw.org 2
https://www.google.co.ve 1
http://www.google.com.br 1
https://www.google.co.in 1
More...

Accessibility

Categories

Upload Details

Uploaded via as Adobe PDF

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment

    Social Media Analytics with a pinch of semantics Social Media Analytics with a pinch of semantics Presentation Transcript

    • Social Media Analytics with a pinch of semantics Harith Alani http://people.kmi.open.ac.uk/harith/ @halani harith-alani @halani
    • Outline of my talk § I’ll start talking § Then I’ll finish talking § You’ll wonder what you’ve learned! § You will clap regardless § You’ll be convinced you learned nothing § You could be right! § But you’re wrong of course § We go to the bar tonight and forget all about the talk!
    • •  Why social media analytics? –  It’s where everyone is! –  Real time information –  Low cost –  Much of it Survey of 3800 marketers on how they use social media to grow their business Social Media for Businesses
    • §  “they can't be forced to use social apps, they must opt-in” §  “need a detailed understanding of social networks: how people are currently working, who they work with and what their needs are”
    • 5 Measuring Social Media
    • 6 Tools for monitoring social networks
    • LinkedIn Group Analytics
    • Facebook Insights •  Provides measurements on FB Page performance •  Provides demographic data about visitors, and their engagement with posts •  “Experiment with different types of posts to see what your audience responds to best.”
    • Social Media Challenges •  Integration –  How to represent and connect this data? •  Behaviour –  How can we measure and predict behaviour? –  Which behaviours are good/ bad in which community type? •  Change –  Can we influence behaviour change? •  Community Health –  What health signs should we look for? –  How to predict them? •  Engagement –  How can we maximise engagement? •  Sentiment –  How to measure it? track it? –  Can we predict sentiment towards entities (brands, people, events)?
    • Forum on a celebrity Forum on transport
    • June 25, 2013
    • In-house Social Platforms Jan 29, 2013
    • Semantically-Interlinked Online Communities (SIOC) •  SIOC aims to enable the integration of online community information. •  SIOC provides a Semantic Web ontology for representing rich data from the Social Web in RDF sioc-project.org
    • Semantics in FB Open Graph
    • Behaviour Analysis
    • Why monitor behaviour? §  Understand impact of behaviour on community evolution §  Forecast community future §  Learn when intervention might be needed §  Learn which behaviour should be encouraged or discouraged §  Find what could trigger certain behaviours §  What is the best mix of behaviour to increase engagement in the community §  To see which users need more support, which ones should be confined, and which ones should be promoted
    • Behaviour analysis in Social Media §  Bottom Up analysis §  Every community member is classified into a “role” §  Unknown roles might be identified §  Copes with role changes over timeini#ators   lurkers   followers   leaders   Structural, social network, reciprocity, persistence, participation Feature levels change with the dynamics of the community Associations of roles with a collection of feature-to-level mappings e.g. in-degree -> high, out-degree -> high Run rules over each user’s features and derive the community role composition
    • Modelling user features and interactions
    • Encoding Rules in Ontologies with SPIN
    • Clustering for identifying emerging roles –  Map the distribution of each feature in each cluster to a level (i.e. low, mid, high) –  Align the mapping patterns with role labels 00 0.274 0.086 0.909** 74 1.000 -0.059 0.513 86 -0.059 1.000 0.065 9** 0.513 0.065 1.000 Table 2: Mapping of cluster dimensions to levels Cluster Dispersion Initiation Quality Popularity 0 L M H L 1 L L L L 2 M H L H 3 H H H H 4 L H H M 5,7 H H L H 6 L H M M 8,9 M H H H 10 L H M H • 3 - Distributed Expert: an expert on a variety of topics and participates across many different fo- rums • 4 - Focussed Expert Initiator: similar to cluster 0 in that this type of user is focussed on certain topics and is an expert on those, but to a large ex- tent starts discussions and threads, indicating that his/her shared content is useful to the community • 5.7 - Distributed Novice: participates across a range of forums but is not knowledgeable on any •  1 - Focussed Novice: focussed within a few select forums but does not provide good quality content. •  2 - Mixed Novice: a novice across a medium range of topics •  3 - Distributed Expert: expert on a variety of topics and participates across many different forums …. Mapping of cluster dimensions to levels
    • Correlation of behaviour with community activity §  How existence of certain behaviour roles impact activity in an online community?
    • Online Community Health Analytics 0.0 0.2 0.4 0.6 0.8 1.0 0.00.20.40.60.81.0 Churn Rate FPR TPR 0.0 0.2 0.4 0.6 0.8 1.0 0.00.20.40.60.81.0 User Count FPR TPR 0.0 0.2 0.4 0.6 0.8 1.0 0.00.20.40.60.81.0 Seeds / Non−seeds Prop FPR TPR 0.0 0.2 0.4 0.6 0.8 1.0 0.00.20.40.60.81.0 Clustering Coefficient FPR TPR •  Machine learning models to predict community health based on compositions and evolution of user behaviour •  Churn rate: proportion of community leavers in a given time segment. •  User count: number of users who posted at least once. •  Seeds to Non-seeds ratio: proportion of posts that get responses to those that don’t •  Cluster coefficient: extent to which the community forms a clique. Health categories 0.0 0.2 0.4 0.6 0.8 1.0 0.00.20.40.60.81.0 Seeds / Non−seeds Prop FPR TPR 0.0 0.2 0.4 0.6 0.8 1.0 0.00.20.40.60.81.0 Clustering Coefficient FPR TPR False Positive Rate False Positive RateFalse Positive Rate False Positive Rate TruePositiveRateTruePositiveRate TruePositiveRateTruePositiveRate The fewer Focused Experts in the community, the more posts will received a reply! There is no “one size fits all” model!
    • Community Types
    • Community types §  Do communities of different types behave differently? §  Analysed IBM Connections communities to study participation, activity, and behaviour of users §  Help us to know what is normal and healthy in a community, and what is not! §  Compare exhibited community with what users say they use the community for §  Does macro behaviour match micro needs?
    • Community types Community   Wiki  Page   Blog  Post   Forum  Thread   Wiki  Edit   Blog  Comment   Forum  Reply   Bookmark  Tag   File   §  Data consists of non- private info on IBM Connections Intranet deployment §  Communities: §  ID §  Creation date §  Members §  Used applications (blogs, Wikis, forums) §  Forums: §  Discussion threads §  Comments §  Dates §  Authors and responders
    • Community types §  Muller, M. (CHI 2012) identified five distinct community types in IBM Connections: §  Communities of Practice (CoP): for sharing information and network §  Teams: shared goal for a particular project or client §  Technical Support: support for a specific technology §  Idea Labs Communities: for focused brainstorming §  Recreation Communities: recreational activities unrelated to work. §  Our data consisted of 186 most active communities: §  100 CoPs, 72 Teams, and 14 Techs communities §  No Ideas of Recreation communities
    • Behaviour in different community types •  Members of Team communities are more engaged, popular, and initiate more discussions •  Tech users are mostly active in a few communities, and don’t initiate of contribute much •  CoP users disperse their activity across many communities, and contribute more Mean and Standard Deviation (in brackets) of the distribution of micro features within the different community types Need an ontology and inference engine of community types Matthew Rowe, Miriam Fernandez, Harith Alani, Inbal Ronen, Conor Hayes and Marcel Karnstedt: Behaviour Analysis across different types of Enterprise Online Communities. ACM WebSci 2012
    • User needs and value
    • 41 % 47 % 8% 3% 1% [Quality of content] . 18% 46% 26% 8% 2% [Number of members] . 31% 53% 13% 2% 1% [Diversity of expertise] . 2% 15 % 30 %30 % 23 % [Level of entertainment] . 44% 50% 4% 2% [Provides accurate answers to questions]. 38% 55% 5% 2% [Contributes good quality and well presented content]. 21% 60% 14% 5% [Provides quick answers to questions]. 38% 49% 8% 5% [Has good expertise in a domain]. 11% 58% 25% 6% [Contributes content frequently] 1% 17% 34%30% 18% [Has many contacts (e.g. Facebook friends)]. 2% 14% 32%31% 21% [Has many fans (e.g. Twitter followers, positive replies to posts)]. Community Value Community Member Value Value of community features Measurements of value and needs satisfaction •  Assessing user engagement and needs satisfaction •  Measuring value of individual users to their communities •  Measuring value of communities to their members
    • Monitoring Online Communities
    • Maslow’s Hierarchy of Needs
    • Mapping Maslow’s hierarchy of needs to social media communities Self_actualisation: Altruistic behavior: helping others, replying to queries, giving rates Self-Esteem: Need to be rated and ranked higher in the community, promotion of roles from novice to active member to expert and moderator Social Belongingness: Need to be part of the community, groups, need for interaction and engagement Security: Need for privacy, security from identity theft, security from online abuse, trolling and bullying Physical: Need for Hardware, Software, Information, Internet access.
    • User groups based on ‘needs’ High Helping Need •  Reply a lot •  Last 17% longer in system •  Contribute to many forums •  High and consistent engagement •  (Self-actualisation) High Information Need •  Contribute 70% less •  Don’t care about ‘points’ and ‘reputation’ •  Don’t stay for long •  Engage with very few users •  (Basic needs) High Social Need •  High level of social interaction •  Moderate reputation scores •  High contribution level •  Low information needs •  (Social belongingness) Recognition Need •  High ‘reputation’ •  Moderate contribution level •  High engagement •  (Self-esteem) ~90% of users at happily staying at the lower levels of the ‘need’s hierarchy’
    • experts to- be about to churn on right path to leadership Behaviour evolution patterns §  Can we predict future behaviour role? §  Who’s on the path to become a leader? an expert? a churner? §  Which users we want to encourage staying/leaving? into becoming an expert - however this development only occurs 4 times 13 10 P28 13 8 P76 1 3 8 10 P103 12 3 P133 1 3 8 10 P155 1 3 6 10 P159 15 7 P190 17 10 P191 1 2 3 10 P193 1 38 10 11 P198 14 10 P201 1 3 10 11 P208 1 3 8 11 P223 1 3 6 10 P283 1 7 8 11 P284 13 6 P302 1 36 8 10 P305 13 10 P343 1 3 8 11 P363 1 38 10 11 P374 13 9 P413 17 8 P415 1 3 8 10 P417 1 2 3 11 P426 1 3 6 10 P427 1 5 7 10 P429 1 5 7 9 P430 1 2 3 8 P434 1 4 9 11 P458 3 8 10 11 P464 14 8 P480 1 35 10 11 P486 12 3 P507 1 2 3 6 P534 1 38 9 11 P537 1 23 6 10 P570 1 4 5 11 P571 7 8 10 11 P586 1 4 9 10 P602 1 3 6 11 P636 1 57 10 11 P654 1 45 9 11 P661 1 78 10 11 P667 1 36 8 10 P685 1 57 8 10 P720 1 2 3 6 P738 1 3 68 9 10 11 P750 1 57 8 10 P772 1 2 3 8 P785 1 3 5 8 9 11 P807 Fig. 6. Progression Patterns where users progress from a novice to an expert role over time
    • Engagement Analysis
    • Tweet recipe for generating engagement §  Identifying seed posts Top features: Time in Day, Readability, Out-Degree, Polarity, Informativeness Top features: Referral Count, Topic Likelihood, Informativeness, Readability, User Age For both datasets: •  Content features play a greater role than user features •  The combination of all features provides the best results •  Predicting discussion activity Top features: Referral Count(-), Complexity(-) Top features: URLs(-), Polarity(-), Topic Likelihood(+), Complexity (+) For both, a decrease in URLs is associated with max activity. Language and terminology are more significant for Boards.ie.
    • Engagement in different communities §  How the results differ: §  from one community type to another §  from random datasets to topic- based ones §  from related experiments in the literature §  Experimented with 7 datasets, from: §  Boards.ie §  Twitter §  SAP §  Server Fault §  Facebook
    • Impact of features on engagement Boards.ie β −2 −1 0 1 2 Twitter Random β −0.5 0.0 0.5 1.0 Twitter Haiti −6e+16 −4e+16 −2e+16 0e+00 2e+16 4e+16 6e+16 Twitter Union β −0.8 −0.6 −0.4 −0.2 0.0 0.2 Server Fault β −1.0 −0.5 0.0 0.5 1.0 1.5 2.0 SAP β −10 −5 0 5 Facebook β −0.1 0.0 0.1 0.2 0.3 0.4 0.5 In−degree Out−degree Post Count Age Post Rate Post Length Referrals Count Polarity Complexity Readability Readability Fog Informativeness EF−IPF CF−IPF Entity Entropy Concept Entropy Entity Degree Centrality Concept Degree Centrality Entity Network Entropy Concept Network Entropy Effects of individual social, content, and semantic features on the response variable (i.e. whether the post seeds engagement or not).
    • Semantic Sentiment Analysis
    • Semantic sentiment analysis on social media §  Offers a fast and cheap access to publics’ feelings towards brands, business, people, etc. §  Range of features and statistical classifiers have been used for in recent years §  Semantics are often neglected §  We add semantics as additional features into the training set for sentiment analysis §  Measure the correlation of the representative concept with negative/ positive sentiment
    • Sentiment Analysis hate negative honest positive inefficient negative Love positive … Sentiment Lexicon I hate the iPhone I really love the iPhone Lexical-Based Approach Learn Model Apply Model Naïve  Bayes,  SVM,  MaxEnt  ,  etc.   Training  Set   Test  Set   Model   Machine Learning Approach
    • Semantic Concept Extraction §  Extract semantic concepts from tweets data and incorporate them into the supervised classifier training. OpenCalais and Zemanta. Their experimental results showed that AlchemyAPI forms best for entity extraction and semantic concept mapping. Our datasets consis informal tweets, and hence are intrinsically different from those used in [10]. Th fore we conducted our own evaluation, and randomly selected 500 tweets from the S corpus and asked 3 evaluators to evaluate the semantic concept extraction outputs g erated from AlchemyAPI, OpenCalais and Zemanta. No. of Concepts Entity-Concept Mapping Accuracy (%) Extraction Tool Extracted Evaluator 1 Evaluator 2 Evaluator 3 AlchemyAPI 108 73.97 73.8 72.8 Zemanta 70 71 71.8 70.4 OpenCalais 65 68 69.1 68.7 Table 2. Evaluation results of AlchemyAPI, Zemanta and OpenCalais. The assessment of the outputs was based on (1) the correctness of the extrac entities; and (2) the correctness of the entity-concept mappings. The evaluation res presented in Table 2 show that AlchemyAPI extracted the most number of conc and it also has the highest entity-concept mapping accuracy compared to OpenCa and Zematna. As such, we chose AlchemyAPI to extract the semantic concepts f our three datasets. Table 3 lists the total number of entities extracted and the numbe semantic concepts mapped against them for each dataset. STS HCR OMD No. of Entities 15139 723 1194 No. of Concepts 29 17 14 Table 3. Entity/concept extraction statistics of STS, OMD and HCR using AlchemyAPI.
    • Likely sentiment for a concept §  Semantic concepts can help determining sentiment even when no good lexical clues are present
    • Impact of adding semantic features §  Incorporating semantics increases accuracy by 6.5% for negative sentiment, and 4.8% for positive sentiment §  F = 75.95%, with 77.18% Precision and 75.33% Recall §  Using baselines of unigrams and part-of-speech features §  More to-dos: §  Semantic Concepts Extraction: Explore more fine-grained approach for the entity extraction and the entity-concept mapping §  Selective Method: Interpolate semantic concepts based on their contribution to the classification performance Saif, Hassan; He, Yulan and Alani, Harith (2012). Semantic sentiment analysis of twitter. In: The 11th International Semantic Web Conference (ISWC 2012), 11-15 November 2012, Boston, MA, USA
    • OK, and now what?!
    • OUSocials §  Many FB groups exist for students of OU courses §  Created and used by students to discuss and share opinions on courses and get support Behaviour   Analysis   Sen#ment     Analysis   Topic   Analysis   Course  tutors   Real  #me   monitoring   •  How  are  opinion  and   sen#ment  towards  a  course   evolving?   •  Who’s  providing  posi#ve/ nega#ve  support?   •  What  topics  are  emerging?   How  they  change  over#me?     •  Do  students  get  the  answers   and  support  they  need?    
    • Analytics over FB groups §  Compare findings to course performance, and student performance
    • Reel Lives
    • Problem Summary •  Fragmented digital selves don’t support social learning and individual empowerment •  Need to enable: –  Digital empowerment –  Improved understanding and social cohesion –  Informed decision making (for individuals) –  Informed policy making (for organisations) –  Facilitating creative participation –  Co-curating of digital personhoods
    • Creating the ‘reels’
    • Changing energy consumption behaviour A Decarbonisation Platform for Citizen Empowerment and Translating Collective Awareness into Behavioural Change August 2012
    • Energy Monitors www.efergy.com greenenergyoptions.co.uk fastcompany.com tdevice.net powerp.co.uk www.energycircle.com indiegogo.com greentechadvocates.com •  Do they change how we consume energy in our homes? •  Are they enough? •  Why? How? What if? Where?
    • Social Eco Feedback Technology
    • Thanks to .. Matthew Rowe (now at Uni Lancaster) Sofia Angeletou (now at BBC) Gregoire BurelMiriam Fernandez Smitashree ChoudhuryHassan Saif
    • Papers http://oro.open.ac.uk/view/person/ha2294.html §  Rowe, Matthew; Fernandez, Miriam; Angeletou, Sofia and Alani, Harith (2012). Community analysis through semantic rules and role composition derivation. Journal of Web Semantics, 18(1) §  Rowe, Matthew; Fernandez, Miriam; Alani, Harith; Ronen, Inbal ; Hayes, Conor and Karnstedt, Marcel (2012). Behaviour analysis across different types of Enterprise Online Communities. In: ACM web Science Conference 2012 (WebSci12), 22-24 June 2012, Evanston, U.S.A. §  Rowe, Matthew; Stankovic, Milan and Alani, Harith (2012). Who will follow whom? Exploiting semantics for link prediction in attention-information networks. In: 11th International Semantic Web Conference (ISWC 2012), 11-15 November 2012, Boston, USA §  Rowe, Matthew and Alani, Harith (2012). What makes communities tick? Community health analysis using role compositions. In: 4th IEEE International Conference on Social Computing, 3-6 September 2012, Amsterdam, The Netherlands §  Wagner, Claudia ; Rowe, Matthew; Strohmaier, Markus and Alani, Harith (2012). Ignorance isn't bliss: an empirical analysis of attention patterns in online communities. In: 4th IEEE International Conference on Social Computing, 3-6 September 2012, Amsterdam, The Netherlands §  Saif, Hassan; He, Yulan and Alani, Harith (2012). Semantic sentiment analysis of twitter. In: The 11th International Semantic Web Conference (ISWC 2012), 11-15 November 2012, Boston, MA, USA. §  Rowe, Matthew; Angeletou, Sofia and Alani, Harith (2011). Predicting discussions on the social semantic web. In: 8th Extended Semantic Web Conference (ESWC 2011), 29 May - 2 June 2011, Heraklion, Greece. §  Rowe, Matthew; Angeletou, Sofia and Alani, Harith (2011). Anticipating discussion activity on community forums. In: Third IEEE International Conference on Social Computing (SocialCom2011) , 9-11 October 2011, Boston, MA, USA. §  Angeletou, Sofia; Rowe, Matthew and Alani, Harith (2011). Modelling and analysis of user behaviour in online communities. In: 10th International Semantic Web Conference (ISWC 2011), 23 - 27 Oct 2010, Bonn, Germany. §  Karnstedt, Marcel ; Rowe, Matthew; Chan, Jeff ; Alani, Harith and Hayes, Conor (2011). The Effect of User Features on Churn in Social Networks. In: ACM Web Science Conference 2011 (WebSci2011), 14 - 17 June 2011, Koblenz, Germany.