Towards Mining Semantic Maturity in Social Bookmarking Systems

1,326 views

Published on

Published in: Technology, Education
  • Be the first to comment

Towards Mining Semantic Maturity in Social Bookmarking Systems

  1. 1. Towards Mining Semantic Maturity in Social Bookmarking Systems Martin Atzmueller 1 , Dominik Benz 1 , Andreas Hotho 2 , Gerd Stumme 1 1 Knowledge and Data Engineering Group (KDE), University of Kassel, Germany 2 Data Mining and Information Retrieval Group University of Würzburg, Germany TexPoint fonts used in EMF. Read the TexPoint manual before you delete this box.: A A A A A A A A A A A
  2. 2. Let it grow! <ul><li>Evidence for Emergent Semantics within Social Applications </li></ul><ul><li>Meaning of tags can be captured at different stages </li></ul> Can we find indicators of „semantic maturity“?
  3. 3. The Story 2011/10/23 Atzmueller et al: Towards Mining Semantic Maturity, SDOW 2011 / 18 Social Bookmarking & Emergent Semantics Maturity Indicators Mining Maturity Profiles Evaluation
  4. 4. Social Tagging <ul><li>Social tagging: Simple and intuitive way to organize all kinds of resources </li></ul><ul><li>Uncontrolled vocabulary: Tags are „just strings “ </li></ul><ul><li>Formal model: Folksonomy F = (U, T, R, Y) </li></ul><ul><ul><li>Users U, Tags T, Resources R </li></ul></ul><ul><ul><li>Tag assignments Y  (U  T  R) </li></ul></ul>2011/10/23 Atzmueller et al: Towards Mining Semantic Maturity, SDOW 2011 / 18 Alice Bob iswc.org bonn.de semantics conference travel
  5. 5. Capturing Tag Semantics <ul><li>Co-occurrence distribution  „ semantic fingerprint “ </li></ul><ul><li>Capture Semantic Relatedness / Synonyms by Cosine Similarity </li></ul>2011/10/23 Atzmueller et al: Towards Mining Semantic Maturity, SDOW 2011 / 18 with Cattuto et al: Semantic Grounding of Tag Relatedness in Social Bookmarking Systems (ISWC 2008)
  6. 6. Semantic Grounding <ul><li>Compute Folksonomy -based </li></ul><ul><li>Relatedness (via Context Vectors) </li></ul><ul><li>Sim( , ) = 0.74 </li></ul>WordNet Synset Taxonomy map Grounded similarity Sim True ( , ) = 0.59 (we used Jiang-Conrath dist.)
  7. 7. Appendix: Music Genre Taxonomy learned from last.fm Music Genre Taxonomy learned from last.fm
  8. 8. The Story 2011/10/23 Atzmueller et al: Towards Mining Semantic Maturity, SDOW 2011 / 18 Cooccurrence Fingerprints capture tag semantics Maturity Indicators Mining Maturity Profiles Evaluation
  9. 9. <ul><li>Frequency </li></ul><ul><ul><li>Intuition: „the more often used, the more mature“ </li></ul></ul><ul><ul><li>Resource frequency </li></ul></ul><ul><ul><li>User frequency </li></ul></ul>Maturity Indicators (1) 2011/10/23 Atzmueller et al: Towards Mining Semantic Maturity, SDOW 2011 / 18
  10. 10. <ul><li>Centrality </li></ul><ul><ul><li>&quot;Importance&quot; within co-occurrence network G = (V,E) </li></ul></ul><ul><ul><li>Intuition: the more important, the more mature </li></ul></ul><ul><ul><li>Degree, Closeness, Betweenness </li></ul></ul>Maturity Indicators (2) 2011/10/23 Atzmueller et al: Towards Mining Semantic Maturity, SDOW 2011 / 18
  11. 11. The Story 2011/10/23 Atzmueller et al: Towards Mining Semantic Maturity, SDOW 2011 / 18 Cooccurrence Fingerprints capture tag semantics Frequency / Centrality Properties as maturity indicators Mining Maturity Profiles Evaluation
  12. 12. Pattern Mining using Subgroup Discovery <ul><li>In a nutshell: &quot;Find descriptions of subsets in the data, that differ significantly for the total population with respect to a target concept . </li></ul><ul><li>Pattern: Conjunctive description using tag properties representation as indicator rule: </li></ul><ul><li>description  target (with probability) </li></ul>2011/10/23 Atzmueller et al: Towards Mining Semantic Maturity, SDOW 2011 / 18
  13. 13. Finding Maturity Patterns  Which patterns maximize target variable? 0.7 0.05 0.1 0.02 0.2 0.3 python 0.3 0.3 0.25 0.2 0.4 0.4 web2.0 0.6 0.43 0.7 0.6 0.7 0.8 web 0.5 0.35 0.3 0.4 0.2 0.1 semantic 1.0 0.2 0.12 0.1 0.1 0.4 games 1.0 0.02 0.04 0.01 0.3 0.2 game 0.8 0.4 0.1 0.2 0.9 0.13 java TARGET Clos Bet Deg Rfreq Ufreq Tag
  14. 14. Mining Maturity Profiles – Target Variables <ul><li>First: Compute most related tag t sim for each tag t (resource context) </li></ul><ul><li>WordNet Synonym Identification ( SYN ) </li></ul><ul><ul><li>Binary Target Variable </li></ul></ul><ul><ul><li>True if t sim is synonym of t, False otherwise </li></ul></ul><ul><li>Grounded WordNet Maturity ( MAT ) </li></ul><ul><ul><li>Binary Target Variable </li></ul></ul><ul><ul><li>Based on taxonomic shortest path length </li></ul></ul><ul><ul><li>True if sim(t sim ,t) > 0.5, false otherwise </li></ul></ul>2011/10/23 Atzmueller et al: Towards Mining Semantic Maturity, SDOW 2011 / 18
  15. 15. Pattern Mining - Algorithm <ul><li>Patterns similar to association rules </li></ul><ul><li>BUT: Fixed target concept (of interest), i.e., high maturity </li></ul><ul><li>Pattern mining – k best approach </li></ul><ul><ul><li>Through space of descriptions (conjunctions of features) </li></ul></ul><ul><ul><li>Maximizing quality function, e.g., increase in target mean/share </li></ul></ul><ul><ul><li>Several efficient algorithms, we apply exhaustive one. </li></ul></ul>2011/10/23 Atzmueller et al: Towards Mining Semantic Maturity, SDOW 2011 / 18
  16. 16. The Story 2011/10/23 Atzmueller et al: Towards Mining Semantic Maturity, SDOW 2011 / 18 Cooccurrence Fingerprints capture tag semantics Frequency / Centrality Properties as maturity indicators Evaluation Discover indicator subgroups which maximize maturity target
  17. 17. Social Bookmarking Data <ul><li>Folksonomy crawled from Delicious in 2006, restricted to top 10,000 tags </li></ul><ul><ul><li>476,378 users </li></ul></ul><ul><ul><li>10,000 tags </li></ul></ul><ul><ul><li>12,660,470 resources </li></ul></ul><ul><ul><li>101,491,722 tag assignments </li></ul></ul><ul><li>Preprocessing & Filtering: </li></ul><ul><ul><li>Filter tags without sufficiently similar partner (cos < 0.05) </li></ul></ul><ul><ul><li>Limit to tags with only one sense in WordNet </li></ul></ul><ul><li>Nr. of finally considered tags: 1944 </li></ul>2011/10/23 Atzmueller et al: Towards Mining Semantic Maturity, SDOW 2011 / 18
  18. 18. Direct Correlation to Target Variables <ul><li>No significant correlation of individual indicator with maturity target </li></ul><ul><li>Eventually higher correlation by combination of indicators </li></ul><ul><li> consider subgroups </li></ul>2011/10/23 Atzmueller et al: Towards Mining Semantic Maturity, SDOW 2011 / 18 0.15 0.15 0.13 0.14 0.12 SYN 0.12 0.15 0.12 0.14 0.09 MAT Ufreq Rfreq Deg Clos Bet
  19. 19. Results: Exemplary Patterns (1) <ul><li>Target : Synonym Identification (SYN) ; mean = 0.13 </li></ul><ul><li>Small groups with highest maturity (measured by increase of synonym discovery rate) </li></ul><ul><li>Larger group: degree centrality + user frequency </li></ul><ul><ul><li>Synonmym discovery rate 128 % higher than for all tags </li></ul></ul>2011/10/23 Atzmueller et al: Towards Mining Semantic Maturity, SDOW 2011 / 18
  20. 20. Results: Exemplary Patterns (2) 2011/10/23 Atzmueller et al: Towards Mining Semantic Maturity, SDOW 2011 / 18 <ul><li>Target : WordNet Maturity (MAT) ; mean = 0.59 </li></ul>
  21. 21. Discussion & Implications <ul><li>In general: centrality and frequency useful to assess maturity </li></ul><ul><li>Combined evidence of indicators leads to higher-quality patterns </li></ul><ul><li>Subgroup discovery generally useful technique </li></ul><ul><li>Open issues: </li></ul><ul><ul><li>Further maturity indicators? </li></ul></ul><ul><ul><li>Alternative notions of maturity? </li></ul></ul><ul><ul><li>Temporal aspects </li></ul></ul><ul><ul><li>Mining of „immaturity“ </li></ul></ul><ul><ul><li>… </li></ul></ul>2011/10/23 Atzmueller et al: Towards Mining Semantic Maturity, SDOW 2011 / 18
  22. 22. The Story 2011/10/23 Atzmueller et al: Towards Mining Semantic Maturity, SDOW 2011 / 18 Cooccurrence Fingerprints capture tag semantics Frequency / Centrality Properties as maturity indicators Combined evidence of indicators leads to higher-quality patterns Discover indicator subgroups which maximize maturity target Thanks! [email_address]
  23. 23. Towards Mining Semantic Maturity in Social Bookmarking Systems Martin Atzmueller 1 , Dominik Benz 1 , Andreas Hotho 2 , Gerd Stumme 1 1 Knowledge and Data Engineering Group (KDE), University of Kassel, Germany 2 Data Mining and Information Retrieval Group University of Würzburg, Germany TexPoint fonts used in EMF. Read the TexPoint manual before you delete this box.: A A A A A A A A A A A

×