Characterising the Emergent Semantics in Twitter Lists

1,426 views

Published on

Presentation at

Published in: Technology, Business
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total views
1,426
On SlideShare
0
From Embeds
0
Number of Embeds
31
Actions
Shares
0
Downloads
7
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide

Characterising the Emergent Semantics in Twitter Lists

  1. 1. Characterising the Emergent Semantics in Twitter Lists Andrés García-Silva †, Jeon-Hyung Kang*, Kristina Lerman*, Oscar Corcho † † {hgarcia, ocorcho}@fi.upm.es Facultad de Informática Universidad Politécnica de Madrid, Spain *{jeonhyuk,lerman}@isi.edu Information Sciences Institute, University of Southern California, USA
  2. 2. Introduction Twitter ListsCharacterising the Emergent Semantics in Twitter Lists 2
  3. 3. Introduction Curators and List NamesCharacterising the Emergent Semantics in Twitter Lists 3
  4. 4. Introduction Members and List NamesCharacterising the Emergent Semantics in Twitter Lists 4
  5. 5. Introduction Subscribers and List NamesCharacterising the Emergent Semantics in Twitter Lists 5
  6. 6. Introduction • Previous examples showed individual uses of lists • Some list names where related among them • What about if we group the lists?Characterising the Emergent Semantics in Twitter Lists 6
  7. 7. IntroductionLists where the Yahoo!Finance user is a membergrouped by frequency of membershipLists where the NASDAQ user is a membergrouped by number of subscriptionsCharacterising the Emergent Semantics in Twitter Lists 7
  8. 8. Introduction: Research questions• Is it possible to identify related keywords from list names according to the use given by the different user roles? • Are two list names related if they have been used by a similar set of curators? • Are two list names related if a similar set of users have subscribe to the corresponding lists? • Are two list names related if their corresponding lists have a similar set of members?• What kind of user roles will generate more related keywords?• What types of relations between keywords can we obtain? • Synonyms, is-a, siblings..? Stocks Investment Curator 1 PersonalBanking Banks Curator 2 List members Subscriber 1Characterising the Emergent Semantics in Twitter Lists 8
  9. 9. Approach Elicit related keywords Characterise the from Twitter lists semantics of the relations Schema Representation Model to identify similar of keywords Pairs of keywords related Based on curators keywords Vector Space Model perTwitter Schema Based on subscribers Lists Latent Dirichlet Rep. and Based on members Allocation Model Characterising the Emergent Semantics in Twitter Lists 9
  10. 10. Approach Elicit related keywords Characterise the from Twitter lists semantics of the relations Synonyms Similarity based on WordNet Is-a Siblings Path Length Indirect is-a Pairs of related Wu & Palmer (Hierarchical Inf.) Specificity of keywords relations per Jiang & Conrath (Distributional Inf.) Schema Rep. and Synonyms Model (sameAs) SPARQL queries over general KBs Binary relations published as Linked Data (TypeOf, BT) DBpedia, OpenCyc, and UMBEL Object Prop. (Occupation)Characterising the Emergent Semantics in Twitter Lists 10
  11. 11. Experiment: Setup • Data set • Total • 297,521 lists, 2,171,140 members, 215,599 curators, and 616,662 subscribers • We extracted 5932 unique keywords from list names; 55% of them were found in WordNet. • We use approximate matching of the list names with dictionary entries • The dictionary was created from Wikipedia article titlesCharacterising the Emergent Semantics in Twitter Lists 11
  12. 12. Experiment: Execution Elicit related keywords from Twitter lists Pairs of Schema Representation related Model to identify similar of keywords keywords keywords per Based on curators Schema Vector Space Model Rep. andData Model Based on subscribers Latent Dirichlet set Allocation Based on members Characterise the semantics of the relations Each keyword Similarity based on WordNet with the 5WordNet Path Length MostSimilarity related Wu & Palmer (Hierarchical Inf.) Jiang & Conrath (Distributional Inf.) Characterising the Emergent Semantics in Twitter Lists 12
  13. 13. Experiment: Data Analysis Pearsons coefficient of correlations Correlation Values (-1 to 1) Average J&C distance and W&P similarityCharacterising the Emergent Semantics in Twitter Lists 13
  14. 14. Experiment: Data Analysis Path Length in WordNet Path Length Members Subscribers Curators VSM LDA VSM LDA VSM LDA 1 (synonyms) 8.58% 10.87% 3.97% 3.24% 1.24% 0.50% 2 (is-a) 3.42% 3.08% 1.93% 0.47% 0.70% 0.00% 3 (Siblings, ind. Is-a) 2.37% 3.77% 2.96% 2.06% 2.38% 4.03% >3 67.61% 65.5% 67.2% 67.5% 77.8% 75.8% % of relations found by each schema representation and model In average 97.65% of the relations with a path length greater than 3 involve a common subsumerCharacterising the Emergent Semantics in Twitter Lists 14
  15. 15. Experiment: Data Analysis Depth (LCS) and path length as indicators of specificity Depth of the least common subsumer Relations in WordNet Length of the path setting up the relation Relations with dept(LCS) >=5Characterising the Emergent Semantics in Twitter Lists 15
  16. 16. Experiment: Findings Summary • Similarity models based on members • produce the results that are most correlated to the results of similarity measures based on WordNet • find more synonyms and direct relations is-a when compared to the other models (path length). • The majority of relations found by any model have a path length >= 3 and involve a common subsumer. • Depth of LCS • VSM based on subscribers produces the highest number of specific relations (depth of LCS >= 5 or 6). • Similarity models based on curators produce a lower number of relations.Characterising the Emergent Semantics in Twitter Lists 16
  17. 17. Experiment: Execution Elicit related keywords from Twitter lists Pairs of Schema Representation related Model to identify similar of keywords keywords keywords per Based on curators Schema Vector Space Model Rep. andData Model Based on subscribers Latent Dirichlet set Allocation Based on members Each Characterise the semantics of the keywordOntological relations with the 5 Relations SPARQL queries over general KBs Most between published as Linked Data related keywords DBpedia, OpenCyc, and UMBEL Characterising the Emergent Semantics in Twitter Lists 17
  18. 18. Experiment • We anchor 63.77% of the keywords extracted from Twitter Lists to DBPedia resourcesCharacterising the Emergent Semantics in Twitter Lists 18
  19. 19. ExperimentVector-space model based on members (direct relations) Relation type Example of keywords Broader Term 26% life-science biotech subClassOf 26% writers authors developer 11% google google_apps genre 11% funland comedy largest city 6% houston texas Others 20% - - Vector-space model based on subscribers (relations of length 3) Linked data pattern (54.73%): x -> object <-y Relations object Keywords type type 67.35% company nokia intel subClassOf subClassOf 30.61% activities philanthropy fundraising Linked data pattern (43.49%): x <-object->y Relations object Keywords genre genre 12.43% Aesthetica theater film occupation genre 10.27% Adam Maxwell fiction writer occupation occupation 8.11% Alina Tugend poet writer product product 7.57% ChenOne clothes fashion industry product 9.73% UserLand Softw. blogs internet known for occupation 5.41% Adeline Yen Mah author writing known for known for 3.78% Rebecca Watson skeptics atheist main interest main interest 3.24% Aristotle politics governmentCharacterising the Emergent Semantics in Twitter Lists 19
  20. 20. Conclusions • Different models to elicit related keywords from Twitter lists. • Curators, Subscribers and members - VSM and LDA • Characterise the semantics of relations: WordNet-based similarity measures and SPARQL queries over linked data setsCharacterising the Emergent Semantics in Twitter Lists 20
  21. 21. Conclusions • Vector-space and LDA models based on members produce the most correlated results to those of WordNet-based metrics. • Shortest JC distance and highest WP similarities • According to the path length in WordNet • Models based on members produce more synonyms and direct is-a • Most of the relations have path length ≥ 3 and have a common subsumer • Depth of LCS • Vector-space model based on subscribers finds highest number of relations (depth LCS ≥ 5 and 4 ≤ path length ≤ 0) • We confirm these results according to linked data setsCharacterising the Emergent Semantics in Twitter Lists 21
  22. 22. Characterising the Emergent Semantics in Twitter Lists Andrés García-Silva †, Jeon-Hyung Kang*, Kristina Lerman*, Oscar Corcho † † {hgarcia, ocorcho}@fi.upm.es Facultad de Informática Universidad Politécnica de Madrid, Spain *{jeonhyuk,lerman}@isi.edu Information Sciences Institute, University of Southern California, USA

×