Live Social Semantics& online community monitoringHarith AlaniKnowledge Media institute,The Open University, UK          h...
Market value of Web Analytics                                2
Location, Sensors, & Social Networking  Tag-Along Marketing  The New York Times,  November 6, 2010                “Everyth...
Location, Sensors, & Social Networking                        The Canine Twitterer                          “Having my dai...
Monitoring online/offline social activity              Where	  is	  everybody?	                                           ...
Monitoring online/offline social activity•  Generating   opportunities for   F2F networking                               ...
Tracking of F2F contact networks                            Sociometer, MIT, 2002                            -    F2F and ...
SocioPatterns platform                 http://www.sociopatterns.org/!   8
Convergence with online social networks                                          9
Online vs. offline social networking•  Digital social networking                          •  Digital networking increase  ...
Offline + online social networking                                Who should                   Anyone I     I talk to?   W...
Live Social Semantics (LSS):     RFIDs + Social Web + Semantic Web                                    <?xml version="1.0"?...
m         Live Social Semantics: architecture Communities of Practice         Communities of Practice                     ...
http://data.semanticweb.org/!SW resourceswww.rkbexplorer.com/!                                                     confere...
Social and information networks                                  15
Merging social networks                  FOAF    16
Distinct, Separated Identity Management                                             Harith	  	         http://tagora.ecs.s...
Tag Filtering Service                        Semantic modeling                        Semantic analysis                   ...
Tag Filtering Service                        19
From Tags to Semantics                         20
Tags to User Interests                         21
From raw tags and social relationsto Structured Data                       Collective                       intelligence  ...
RFIDs for tracking social contact                                    23
People contact à RFID à RDF Triples                                                   foaf#Person1                      ...
25
Real-time F2F networks with SNS links                                           26            http://www.vimeo.com/6590604
Live Social Semantics Deployed at:Data analysis•  Face-to-face interactions across scientific conferences•  Networking beh...
Characteristics of F2F contact network  Network              ESWC 2009        HT 2009         ESWC 2010  characteristics  ...
Characteristics of F2F contact events Contact              ESWC 2009           HT 2009          ESWC 2010 characteristics ...
F2F contacts of returning users                                                            Degree•  Degree: number of othe...
Average seniority of neighbours in F2F networks•    No clear pattern is observed                                     5    ...
Presence	  of	  AJendees	  HT2009	  
Offline networking vs online networking                                                                 Twitterers        ...
Scientific seniority vs Twitter followers                                                          Twitter users          ...
Conference Chairs                                    all     chairs    all     chairs                               partic...
Networking with online and offline ‘friends’Characteristics             all users       coauthors        Facebook         ...
Twitterers vs Non-Twitterers•  Time spent in conference rooms  –  Twitter users spent on average 11.4% more time in the   ...
Behaviour of individuals – micro level analysis(#$" 6DD1">?@20AB?M"                                                       ...
Behaviour analysis    Jeffrey Chan, Conor Hayes, and Elizabeth Daly. Decomposing discussion forums using    common user ro...
Role Skeleton
Encoding Rules in Ontologies with SPIN
Approach for inferring User RolesStructural, social network,               Feature levels change with thereciprocity, pers...
Data from Boards.ie•  Forum 246 (Commuting and Transport): Demonstrates a clear increase in   activity over time.•  Forum ...
ResultsCommuting and Transport           Rugby                Mobile Phones and PDAs•  Correlation of individual features ...
(a) Forum 246: Commuting and Transport                                         Results                                    ...
Prediction analysis – preliminary results!•  Predicting rise/fall in post submission numbers•  Binary classification•  Fea...
Rise and fall of social networks                                   47
Predicting engagement•  Which posts will receive a reply?  –  What are the most influential features here?•  How much disc...
user attributes - describing the reputation of the user - and attributes of a post’s    content - generally referred to as...
Experiment for identifying Twitter seed posts •  Twitter data on the Haiti earthquake, and the Union    Address     Datase...
first report on the results obtained from our model selection phase, before moving   Identifying seeds with different type ...
Impact of different features in Twitterwhich we found to be 0.674 indicating a good correlation between the two listsand• ...
7     content-polarity (0.064)          content-referral-count (0.030)                             8     user-out-degree (...
Predicting discussion activity on Twitter•  Reply rates:  –  Haiti 1-74 responses, Union Address 1-75 responses•  Compare ...
Predicting discussion activity on Twitter    Haiti dataset                              Union Address dataset           • ...
Identifying seed posts in Boards.ie•  Used the same features as before  –  User features     •  In-degree, out-degree, pos...
Experiment for identifying seed posts•  Used all posts from Boards.ie in 2006•  Built features using a 6-month window prio...
h the features (i.e., user                               TABLE IIom t − 188 to t − 1. In        R ESULTS FROMTHE CLASSIFIC...
Positive/negative impact of features on Boards.ie                                                       TABLE III         ...
Predicting Discussion Activity in Boards.ie•  What impact do features have on discussion length?  –  Assessed Linear Regre...
Stay tuned•  More communities  –  SAP, IBM, StackOverflow, Reddit  –  Compare impact of features on their dynamics•  Bette...
Upcoming events             Social Object Networks              IEEE Social Computing, 2011                October 9-10, B...
Acknowledgement    My social semantics team                       Live Social Semantics team  Sofia Angeletou             ...
Harith Alani's presentation at SSSW 2011
Upcoming SlideShare
Loading in...5
×

Harith Alani's presentation at SSSW 2011

1,775

Published on

Published in: Technology, Business
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total Views
1,775
On Slideshare
0
From Embeds
0
Number of Embeds
3
Actions
Shares
0
Downloads
7
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide

Harith Alani's presentation at SSSW 2011

  1. 1. Live Social Semantics& online community monitoringHarith AlaniKnowledge Media institute,The Open University, UK http://twitter.com/halani http://delicious.com/halani http://www.linkedin.com/pub/harith-alani/9/739/534 Semantic Web Summer School Cercedilla, Spain,, 2011 1
  2. 2. Market value of Web Analytics 2
  3. 3. Location, Sensors, & Social Networking Tag-Along Marketing The New York Times, November 6, 2010 “Everything is in place for location-based social networking to be the next big thing. Tech companies are building the platforms, venture capitalists are providing the cash and marketers are eager to develop advertising. “ 3
  4. 4. Location, Sensors, & Social Networking The Canine Twitterer “Having my daily workout. Already did 15 leg lifts!” 4
  5. 5. Monitoring online/offline social activity Where  is  everybody?   5
  6. 6. Monitoring online/offline social activity•  Generating opportunities for F2F networking 6
  7. 7. Tracking of F2F contact networks Sociometer, MIT, 2002 -  F2F and productivity -  F2F dynamics -  Who are key players? -  F2F and office distance TraceEncounters - 2004 7
  8. 8. SocioPatterns platform http://www.sociopatterns.org/! 8
  9. 9. Convergence with online social networks 9
  10. 10. Online vs. offline social networking•  Digital social networking •  Digital networking increase increases physical social social interaction isolation –  Create more opportunities to network•  Causes –  Supports and increases F2F –  Genetic alterations contact! –  Weakened immune system –  Stronger offline social tiesà –  Less resistant to cancer more online communication –  Higher risk of heart disease –  Stronger offline social ties à –  Higher blood pressure more diverse online –  Faster dementia communications –  Narrower arteries –  F2F is medium of choice in weaker social ties Barry Wellman, The Glocal Village: Internet andAric Sigman, “Well Connected? The Biological Community, Idea’s - The Arts & Science Review,Implications of Social Networking’”, Biologist, 56 University of Toronto, 1(1),2004(1), 2009 10
  11. 11. Offline + online social networking Who should Anyone I I talk to? Where have I know here? met this guy? Where should I go? ESWC2010 11
  12. 12. Live Social Semantics (LSS): RFIDs + Social Web + Semantic Web <?xml version="1.0"?>! <rdf:RDF! xmlns="http:// tagora.ecs.soton.ac.uk/schemas/ tagging#"! xmlns:rdf="http://www.w3.org/ 1999/02/22-rdf-syntax-ns#"! xmlns:xsd="http://www.w3.org/2001/ XMLSchema#"! xmlns:rdfs="http://www.w3.org/ 2000/01/rdf-schema#"! xmlns:owl="http://www.w3.org/ 2002/07/owl#"! xml:base="http:// tagora.ecs.soton.ac.uk/schemas/ tagging">! <owl:Ontology rdf:about=""/>! <owl:Class rdf:ID="Post"/>! <owl:Class rdf:ID="TagInfo"/>! <owl:Class rdf:ID="GlobalCooccurrenceInfo"/>! <owl:Class rdf:ID="DomainCooccurrenceInfo"/>! <owl:Class rdf:ID="UserTag"/>! <owl:Class rdf:ID="UserCooccurrenceInfo"/>! <owl:Class rdf:ID="Resource"/>! <owl:Class rdf:ID="GlobalTag"/>! <owl:Class rdf:ID="Tagger"/>! <owl:Class rdf:ID="DomainTag"/>! <owl:ObjectProperty rdf:ID="hasPostTag">! <rdfs:domain rdf:resource="#TagInfo"/>! </owl:ObjectProperty>! <owl:ObjectProperty rdf:ID="hasDomainTag">! <rdfs:domain rdf:resource="#UserTag"/>! </owl:ObjectProperty>! <owl:ObjectProperty rdf:ID="isFilteredTo">!•  Integration of physical presence and online information <rdfs:range rdf:resource="#GlobalTag"/>! <rdfs:domain•  Semantic user profile generation rdf:resource="#GlobalTag"/>! </owl:ObjectProperty>! <owl:ObjectProperty•  Logging of face-to-face contactrdf:ID="hasResource">! <rdfs:domain rdf:resource="#Post"/>! <rdfs:range =…!•  Social network browsing•  Analysis of online vs offline social networks
  13. 13. m Live Social Semantics: architecture Communities of Practice Communities of Practice dbtune.org rkbexplorer.com Publications Profile Builder dbpedia.org Publications Profileorg semanticweb.org ontology Web-based Systems Profile interests data.semanticweb.org TAGora Sense builderDelicious rkbexplorer.com RepositoryExtractor Extractor publications, co-authorship networks FlickrDaemon Social Tagging mbid -> dbpedia uri Daemon Social Tagging Social Networks tag -> dbpedia uri Social Networks LastFMConnect API JXT Triple Store Facebook Connect API JXT Trip Contacts social semantics Contacts URIs Tag disambiguation Social triple store Social service Semantics RDF cache Aggregator Semantics RDF cache contacts data RFID Local Local Readers Real World Server Server Real-World Tag to URI Real-World networks service Contact Data Contact Data tags RFID Badges Visualization Web Interface Linked Data Visualization Web In Linked data Web interface Visualization 13
  14. 14. http://data.semanticweb.org/!SW resourceswww.rkbexplorer.com/! conference chair proceedings chair author CoP 14
  15. 15. Social and information networks 15
  16. 16. Merging social networks FOAF 16
  17. 17. Distinct, Separated Identity Management Harith     http://tagora.ecs.soton.ac.uk/ Alani   LiveSocialSemantics/eswc2009/foaf/2  Delicious  Tagging  and  Network   RFID  Contact  Data   http://tagora.ecs.soton.ac.uk/ http://tagora.ecs.soton.ac.uk/ delicious/halani LiveSocialSemantics/eswc2009/1139Flickr  Tagging  and  Contacts   Conference  Publica>on  Data   http://tagora.ecs.soton.ac.uk/flickr/ http://data.semanticweb.org/person/ 69749885@N00 harith-alani/Las:m  favourite  ar>sts  and  friends   Past  Publica>ons,  Projects,  Communi>es  of  Prac>ce   http://tagora.ecs.soton.ac.uk/ http://southampton.rkbexplorer.com/id/ lastfm/halani person-05877Facebook  contacts   http://tagora.ecs.soton.ac.uk/ facebook/568493878
  18. 18. Tag Filtering Service Semantic modeling Semantic analysis Collective intelligence Statistical analysis Syntactical analysis 18
  19. 19. Tag Filtering Service 19
  20. 20. From Tags to Semantics 20
  21. 21. Tags to User Interests 21
  22. 22. From raw tags and social relationsto Structured Data Collective intelligence User raw Semantic data data Structured data ontologies 22
  23. 23. RFIDs for tracking social contact 23
  24. 24. People contact à RFID à RDF Triples foaf#Person1 contactWith   Place hasContact   foaf#Person2 contactPlace   F2FContact contactDate   contactDura>on   XMLSchema#date   XMLSchema#>me   24
  25. 25. 25
  26. 26. Real-time F2F networks with SNS links 26 http://www.vimeo.com/6590604
  27. 27. Live Social Semantics Deployed at:Data analysis•  Face-to-face interactions across scientific conferences•  Networking behaviour of frequent users•  Correlations between scientific seniority and social networking•  Comparison of F2F contact network with Twitter and Facebook•  Social networking with online and offline friends 27
  28. 28. Characteristics of F2F contact network Network ESWC 2009 HT 2009 ESWC 2010 characteristics Number of users 175 113 158 Average degree 54 39 55 Avg. strength (mn) 143 123 130 Avg. weight (mn) 2.65 3.15 2.35 Weights ≤ 1 mn 70% 67% 74% Weights ≤ 5 mn 90% 89% 93% Weights ≤ 10 mn 95% 94% 96%•  Degree is number of people with whom the person had at least one F2F contact•  Strength is the time spent in a F2F contact•  Edge weight is total time spent by a pair of users in F2F contact 28
  29. 29. Characteristics of F2F contact events Contact ESWC 2009 HT 2009 ESWC 2010 characteristics Number of 16258 9875 14671 contact events Average contact 46 42 42 length (s) Contacts ≤ 1mn 87% 89% 88% Contacts ≤ 2mn 94% 96% 95% Contacts ≤ 5mn 99% 99% 99% Contacts ≤ 10mn 99.8% 99.8% 99.8% F2F contact pattern is very similar for all three conferences
  30. 30. F2F contacts of returning users Degree•  Degree: number of other 10 2 participants with whom an attendee has interacted 1 10 1 2 10 10•  Total time: total time spent in ESWC2010 Total interaction time interaction by an attendee 4 10 3 10 3 4 5 10 10 10•  Link weight: total time spent in F2F 4 Links’ weights 10 interaction by a pair of returning 3 10 attendees in 2010, versus the same 2 10 quantity measured in 2009 1 10 1 2 3 4 5 10 10 10 10 10 ESWC 2009 & Pearson Correlation ESWC2009 ESWC 2010 Degree 0.37 Time spent on F2F networking by frequent users is stable, even when the list of Total F2F 0.76 interaction time people they networked with changed Link weight 0.75 30
  31. 31. Average seniority of neighbours in F2F networks•  No clear pattern is observed 5 if the unweighted average senn Avg seniority of the neighbours over all neighbours in the Average seniority of neighbors senn,w with weighted averages aggregated network is 4 considered senn,max Seniority of user with strongest link•  A correlation is observed 3 when each neighbour is weighted by the time spent with the main person 2•  The correlation becomes much stronger when 1 considering for each individual only the neighbour with whom the most time was spent 0 0 5 10 seniority (number of papers) Conference attendees tend to networks with others of similar levels of scientific seniority 31
  32. 32. Presence  of  AJendees  HT2009  
  33. 33. Offline networking vs online networking Twitterers Spearman Correlation (ρ) Tweets – F2F Degree - 0.15 Tweets – F2F Strength - 0.15 Twitter Following – F2F - 0.21 Degree users Users with Facebook and Twitter accounts in ESWC 2010 •  people who have a large number of friends on Twitter and/or Facebook don’t seem to be the most socially active in the offline world in comparison to other SNS users No strong correlation between amount of F2F contact activity and size of online social networks 33
  34. 34. Scientific seniority vs Twitter followers Twitter users Correlation H-index – Twitter Followers 0.32 (#$" H-index – Tweets - 0.13 (" !#" *+,-./"01221+./3" !#&" 45678.9" *+..:3" !#%" !#$" !" (" &" ((" (&" $(" $&" )(" )&" %(" users •  Comparison between people’s scientific seniority and the number of people following them on Twitter People who have the highest number of Twitter followers are not necessarily the most scientifically senior, although they do have high visibility and experience 34
  35. 35. Conference Chairs all chairs all chairs participants 2009 participants 2010 2009 2010average degree 55 77.7 54 77.6average strength 8590 19590 7807 22520average weight 159 500 141 674average number of 3.44 8 3.37 12events per edge •  Conf chairs interact with more distinct people (larger average degree) •  Conf chairs spend more time in F2F interaction (almost three times as much as a random participant)
  36. 36. Networking with online and offline ‘friends’Characteristics all users coauthors Facebook Twitter friends followersaverage contact 42 75 63 72duration (s)average edge weight 141 4470 830 1010(s)average number of 3.37 60 13 14events per edge •  Individuals sharing an online or professional social link meet much more often than other individuals •  Average number of encounters, and total time spent in interaction, is highest for co-authors F2F contacts with Facebook & Twitter friends were respectively %50 and %71 longer, and %286 and %315 more frequent than with others They spent %79 more time in F2F contacts with their co-authors, and they met them %1680 more times than they met non co-authors
  37. 37. Twitterers vs Non-Twitterers•  Time spent in conference rooms –  Twitter users spent on average 11.4% more time in the conf rooms than non-twitter users (mean is 26% higher)•  Number of people met F2F during the conference –  Twitter users met on average 9% more people F2F (mean 8% higher)•  Duration of F2F contacts –  Twitter users spent on average 63% more time in F2F contact than non twitter users (mean is 20% higher) 37
  38. 38. Behaviour of individuals – micro level analysis(#$" 6DD1">?@20AB?M" 89O1209>M"PQM"12R2<DE27>#";01">D?@;<">@60;<>"" @0"K88"92;L" S:DT>"9:2"0239">9;7"72>2;7?:27N" ("!#"!#&" :2;<9:=">?@20AB?"C" >D?@;<"E7DB<2>#"F72G" ?:;@7>HIJ>"!#%"!#$" DO9>@127M" :@6:" >:=" E7DB<2" >?@20A>9N" !" (" )" *" (+" (," $(" $)" $*" ++" +," %(" %)" -./0123" 4$4"526722" 4$4"8972069:" 38
  39. 39. Behaviour analysis Jeffrey Chan, Conor Hayes, and Elizabeth Daly. Decomposing discussion forums using common user roles. In Proc. Web Science Conf. (WebSci10), Raleigh, NC: US, 2010
  40. 40. Role Skeleton
  41. 41. Encoding Rules in Ontologies with SPIN
  42. 42. Approach for inferring User RolesStructural, social network, Feature levels change with thereciprocity, persistence, participation dynamics of the communityRun our rules over each user’s features Associate Roles with a collection ofand derive the role composition feature-to-level Mappings e.g. in-degree -> high, out-degree -> high 42
  43. 43. Data from Boards.ie•  Forum 246 (Commuting and Transport): Demonstrates a clear increase in activity over time.•  Forum 388 (Rugby): Exhibits periodic increase and decrease in activity and hence it provides good examples of healthy/unhealthy evolutions.•  Forum 411 (Mobile Phones and PDAs): Increase in activity over time with some fluctuation - i.e. reduction and increase over various time windows.•  For the time in 2004-01 to 2006-12
  44. 44. ResultsCommuting and Transport Rugby Mobile Phones and PDAs•  Correlation of individual features in each of the three forums
  45. 45. (a) Forum 246: Commuting and Transport Results (b) Forum 388: Rugby (c) Forum 411: Mobile Phones and PDAs •  Variation in behaviour composition & activity •  Behaviour composition in/ stability influences forum activity
  46. 46. Prediction analysis – preliminary results!•  Predicting rise/fall in post submission numbers•  Binary classification•  Features : Community composition, roles and percentages of users associated with each Forum P R F1 ROC 246 0.799 0.769 0.780 0.800 388 0.603 0.615 0.605 0.775 411 0.765 0.692 0.714 0.617 All 0.583 0.667 0.607 0.466 •  Cross-community predictions are less reliable than individual community analysis due to the idiosyncratic behaviour observed in each individual community
  47. 47. Rise and fall of social networks 47
  48. 48. Predicting engagement•  Which posts will receive a reply? –  What are the most influential features here?•  How much discussion will it generate? –  What are the key factors of lengthy discussions? 48
  49. 49. user attributes - describing the reputation of the user - and attributes of a post’s content - generally referred to as content features. In Table 1 we define user andCommon online communityFeatures features content features and study their influence on the discussion “continuation”. Table 1. User and Content User Features In Degree: Number of followers of U # Out Degree: Number of users U follows # List Degree: Number of lists U appears on. Lists group users by topic # Post Count: Total number of posts the user has ever posted # User Age: Number of minutes from user join date # P ostCount Post Rate: Posting frequency of the user U serAge Content Features Post length: Length of the post in characters # Complexity: Cumulative entropy of the unique words in post p λ i∈[1,n] pi(log λ−log pi) of total word length n and pi the frequency of each word λ Uppercase count: Number of uppercase words # Readability: Gunning fog index using average sentence length (ASL) [7] and the percentage of complex words (PCW). 0.4(ASL + P CW ) Verb Count: Number of verbs # Noun Count: Number of nouns # Adjective Count: Number of adjectives # Referral Count: Number of @user # Time in the day: Normalised time in the day measured in minutes # Informativeness: Terminological novelty of the post wrt other posts The cumulative tfIdf value of each term t in post p t∈p tf idf (t, p) Polarity: Cumulation of polar term weights in p (using P o+N e Sentiwordnet3 lexicon) normalised by polar terms count |terms|•  How do all these features influence activity generation in an online 4.2 Experiments community? are intended to test the performance of different classification mod- Experiments – els in identifying seed posts. Therefore we used four classifiers: discriminative Such knowledge leads to better use and management of the community 49 classifiers Perceptron and SVM, the generative classifier Naive Bayes and the
  50. 50. Experiment for identifying Twitter seed posts •  Twitter data on the Haiti earthquake, and the Union Address Dataset Users Tweets Seeds Non-seeds Replies Haiti 44,497 65,022 1,405 60,686 2,931 Union Address 66,300 80,272 7,228 55,169 17,875 •  Evaluated a binary classification task –  Is this post a seed post or not? 50
  51. 51. first report on the results obtained from our model selection phase, before moving Identifying seeds with different type ofonto our results from using the best model with the top-k features. featuresTable 3. Results from the classification of seed posts using varying feature sets andclassification models (a) Haiti Dataset (b) Union Address Dataset P R F1 ROC P R F1 ROC User Perc 0.794 0.528 0.634 0.727 User Perc 0.658 0.697 0.677 0.673 SVM 0.843 0.159 0.267 0.566 SVM 0.510 0.946 0.663 0.512 NB 0.948 0.269 0.420 0.785 NB 0.844 0.086 0.157 0.707 J48 0.906 0.679 0.776 0.822 J48 0.851 0.722 0.782 0.830 Content Perc 0.875 0.077 0.142 0.606 Content Perc 0.467 0.698 0.560 0.457 SVM 0.552 0.727 0.627 0.589 SVM 0.650 0.589 0.618 0.638 NB 0.721 0.638 0.677 0.769 NB 0.762 0.212 0.332 0.649 J48 0.685 0.705 0.695 0.711 J48 0.740 0.533 0.619 0.736 All Perc 0.794 0.528 0.634 0.726 All Perc 0.630 0.762 0.690 0.672 SVM 0.483 0.996 0.651 0.502 SVM 0.499 0.990 0.664 0.506 NB 0.962 0.280 0.434 0.852 NB 0.874 0.212 0.341 0.737 J48 0.824 0.775 0.798 0.836 J48 0.890 0.810 0.848 0.8774.3 ResultsOur•  findings from Table 3 demonstrate the effectiveness of using solely user User features are most important in Twitterfeatures for identifying seed posts. Infeatures gives best results Address datasets •  But combining user & content both the Haiti and Uniontraining a classification model using user features shows improved performance51over the same models trained using content features. In the case of the Union
  52. 52. Impact of different features in Twitterwhich we found to be 0.674 indicating a good correlation between the two listsand• their respective ranks.the highest impact on identification of seed What features have posts?TableRank features by information gainGain Ratio wrt Seed Post class label. The •  4. Features ranked by Information ratio wrt seed post class labelfeature name is paired within its IG in brackets. Rank Haiti Union Address 1 user-list-degree (0.275) user-list-degree (0.319) 2 user-in-degree (0.221) content-time-in-day (0.152) 3 content-informativeness (0.154) user-in-degree (0.133) 4 user-num-posts (0.111) user-num-posts (0.104) 5 content-time-in-day (0.089) user-post-rate (0.075) 6 user-post-rate (0.075) user-out-degree (0.056) 7 content-polarity (0.064) content-referral-count (0.030) 8 user-out-degree (0.040) user-age (0.015) 9 content-referral-count (0.038) content-polarity (0.015) 10 content-length (0.020) content-length (0.010) 11 content-readability (0.018) content-complexity (0.004) 12 user-age (0.015) content-noun-count (0.002) 13 content-uppercase-count (0.012) content-readability (0.001) 14 content-noun-count (0.010) content-verb-count (0.001) 15 content-adj-count (0.005) content-adj-count (0.0) 16 content-complexity (0.0) content-informativeness (0.0) 17 content-verb-count (0.0) content-uppercase-count (0.0) 52
  53. 53. 7 content-polarity (0.064) content-referral-count (0.030) 8 user-out-degree (0.040) user-age (0.015) 9 content-referral-count (0.038) content-polarity (0.015)Positive/negative impact of features 10 11 12 content-length (0.020) content-readability (0.018) user-age (0.015) content-length (0.010) content-complexity (0.004) content-noun-count (0.002) 13 content-uppercase-count (0.012) content-readability (0.001) 14 content-noun-count (0.010) content-verb-count (0.001)•  What is the correlation between seed posts and features? 15 16 content-adj-count (0.005) content-complexity (0.0) content-adj-count (0.0) content-informativeness (0.0) 17 content-verb-count (0.0) content-uppercase-count (0.0) Haiti Union Address Fig. 3. Contributions of top-5 features to identifying Non-seeds (N ) and Seeds(S). Upper plots are for the Haiti dataset and the lower plots are for the Union Address 53 dataset.
  54. 54. Predicting discussion activity on Twitter•  Reply rates: –  Haiti 1-74 responses, Union Address 1-75 responses•  Compare rankings –  Ground truth vs predicted•  Experiments –  Using Haiti and Union Address datasets –  Evaluate predicted rank k where k={1,5,10,20,50,100) –  Support Vector Regression with user, content, user+content features Dataset Training Test size Test Vol Test Vol SD size Mean Haiti 980 210 1.664 3.017 Union Address 5,067 1,161 1.761 2.342 54
  55. 55. Predicting discussion activity on Twitter Haiti dataset Union Address dataset •  Content features are key for top ranks •  Use features more important for higher ranks 55
  56. 56. Identifying seed posts in Boards.ie•  Used the same features as before –  User features •  In-degree, out-degree, post count, user age, post rate –  Content features •  Post Length, complexity, readability, referral count, time in day, informativeness, polarity•  New features designed to capture user affinity –  Forum Entropy •  Concentration of forum activity •  Higher entropy = large forum spread –  Forum Likelihood •  Likelihood of forum post given user history •  Combines post history with incoming data 56
  57. 57. Experiment for identifying seed posts•  Used all posts from Boards.ie in 2006•  Built features using a 6-month window prior to seed post date Posts Seeds Non-Seeds Replies Users 1,942,030 90,765 21,800 1,829,465 29,908•  Evaluated a binary classification task –  Is this post a seed post or not? –  Precision, Recall, F1 and Accuracy –  Tested: user, content, focus features, and their combinations 57
  58. 58. h the features (i.e., user TABLE IIom t − 188 to t − 1. In R ESULTS FROMTHE CLASSIFICATION OF SEED POSTS USING Identifying seeds with different type ofhe features compiled for outcomes and will not VARYING FEATURE SETS AND CLASSIFICATION MODELS features user may increase their User SVM P 0.775 R 0.810 F 0.774 ROC 0.581 1ich would not be a true Naive Bayes 0.691 0.767 0.719 0.540ime the post was made. Max Ent 0.776 0.806 0.722 0.556 J48 0.778 0.809 0.734 0.582e number of posts (seeds, Content SVM 0.739 0.804 0.729 0.511tained within. Naive Bayes 0.730 0.794 0.740 0.616 Max Ent 0.758 0.806 0.730 0.678TING S EED P OSTS J48 0.795 0.822 0.783 0.617 ls are often hindered by Focus SVM 0.649 0.805 0.719 0.500 Naive Bayes 0.710 0.737 0.722 0.588We alleviate this problem Max Ent 0.649 0.805 0.719 0.586 and non-seeds through a J48 0.649 0.805 0.719 0.500posts have been identified User + Content SVM 0.790 0.808 0.727 0.509 Naive Bayes 0.712 0.772 0.732 0.593 of discussion that such Max Ent 0.767 0.807 0.734 0.671ook for the best classifier J48 0.795 0.821 0.779 0.675 ts and then search for the User + Focus SVM 0.776 0.810 0.776 0.583 Naive Bayes 0.699 0.778 0.724 0.585 guishing seed posts from Max Ent 0.771 0.806 0.722 0.607atures that are associated J48 0.777 0.810 0.742 0.617 Content + Focus SVM 0.750 0.805 0.729 0.511 Naive Bayes 0.732 0.787 0.746 0.658 Max Ent 0.762 0.807 0.731 0.692 J48 0.798 0.823 0.787 0.662 the previously described All SVM 0.791 0.808 0.727 0.510ntaining both seeds and Naive Bayes 0.724 0.780 0.740 0.637 Max Ent 0.768 0.808 0.733 0.688r collection of posts we J48 0.798 0.824 0.792 0.692tures listed in section III 58
  59. 59. Positive/negative impact of features on Boards.ie TABLE III R EDUCTION IN F1 LEVELS AS INDIVIDUAL FEATURES ARE DROPPED FROM THE J 48 CLASSIFIER•  What are the most Feature Dropped F1 important features for - 0.815 predicting seed posts? Post Count In-Degree 0.815 0.811* Out-Degree 0.811* User Age 0.807*** Post Rate 0.815 Forum Entropy 0.815•  Correlations: Forum Likelihood 0.798*** Post Length 0.810** –  Referral counts (non-seeds) Complexity 0.811** –  Forum likelihood (seeds) Readability 0.802*** Referral Count 0.793*** –  Informativeness (non-seeds) Time in Day 0.810** Informativeness 0.801*** –  Readability (seeds) Polarity 0.808*** Signif. codes: p-value < 0.001 *** 0.01 ** 0.05 * 0.1 . –  User age (non-seeds) hyperlinks (e.g., ads and spams). This contrasts with work in Twitter which found that tweets containing many links were 59
  60. 60. Predicting Discussion Activity in Boards.ie•  What impact do features have on discussion length? –  Assessed Linear Regression model with focus and content features –  Forum Likelihood (pos) –  Content Length (+/neutral) –  Complexity (pos) –  Readability (+/neutral) –  Referral Count (neg) –  Time in Day (+/neutral) –  Informativeness (-/neutral) –  Polarity (neg) 60
  61. 61. Stay tuned•  More communities –  SAP, IBM, StackOverflow, Reddit –  Compare impact of features on their dynamics•  Better behaviour analysis –  Less features, more forums/communities, more graphs! –  Healthy? posts, reciprocation, discussions, sentiment mixture•  Churn analysis –  Correlation of features/behaviour to ‘bounce rate’ (WebSci11 best paper)•  Intervention! –  Opportunities and mechanisms to influence behaviour 61
  62. 62. Upcoming events Social Object Networks IEEE Social Computing, 2011 October 9-10, Boston, USA http://ir.ii.uam.es/socialobjects2011/ ! Deadline: August 5, 2011 Intelligent Web Services Meet Social Computing AAAI Spring Symposium 2012, March 26-28, Stanford, California http://vitvar.com/events/aaai-ss12 Deadline: Octover 7, 2011 62
  63. 63. Acknowledgement My social semantics team Live Social Semantics team Sofia Angeletou Ciro Cattuto Wouter van Den Broeck Matthew Rowe Research Associate ISI, Turin ISI, Turin Research Associate Alain Barrat Martin Szomszor CPT Marseille & ISI CeRC, City University, UK Gianluca Correndo, Uni Southampton Ivan Cantador, UAM, Madrid STI International ESWC09/10 & HT09 chairs and organisers All LSS participants 63
  1. A particular slide catching your eye?

    Clipping is a handy way to collect important slides you want to go back to later.

×