Towards Mining Semantic Maturity in Social Bookmarking Systems
Upcoming SlideShare
Loading in...5
×
 

Like this? Share it with your network

Share

Towards Mining Semantic Maturity in Social Bookmarking Systems

on

  • 908 views

 

Statistics

Views

Total Views
908
Views on SlideShare
907
Embed Views
1

Actions

Likes
3
Downloads
8
Comments
0

1 Embed 1

http://paper.li 1

Accessibility

Categories

Upload Details

Uploaded via as Microsoft PowerPoint

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment

Towards Mining Semantic Maturity in Social Bookmarking Systems Presentation Transcript

  • 1. Towards Mining Semantic Maturity in Social Bookmarking Systems Martin Atzmueller 1 , Dominik Benz 1 , Andreas Hotho 2 , Gerd Stumme 1 1 Knowledge and Data Engineering Group (KDE), University of Kassel, Germany 2 Data Mining and Information Retrieval Group University of Würzburg, Germany TexPoint fonts used in EMF. Read the TexPoint manual before you delete this box.: A A A A A A A A A A A
  • 2. Let it grow!
    • Evidence for Emergent Semantics within Social Applications
    • Meaning of tags can be captured at different stages
     Can we find indicators of „semantic maturity“?
  • 3. The Story 2011/10/23 Atzmueller et al: Towards Mining Semantic Maturity, SDOW 2011 / 18 Social Bookmarking & Emergent Semantics Maturity Indicators Mining Maturity Profiles Evaluation
  • 4. Social Tagging
    • Social tagging: Simple and intuitive way to organize all kinds of resources
    • Uncontrolled vocabulary: Tags are „just strings “
    • Formal model: Folksonomy F = (U, T, R, Y)
      • Users U, Tags T, Resources R
      • Tag assignments Y  (U  T  R)
    2011/10/23 Atzmueller et al: Towards Mining Semantic Maturity, SDOW 2011 / 18 Alice Bob iswc.org bonn.de semantics conference travel
  • 5. Capturing Tag Semantics
    • Co-occurrence distribution  „ semantic fingerprint “
    • Capture Semantic Relatedness / Synonyms by Cosine Similarity
    2011/10/23 Atzmueller et al: Towards Mining Semantic Maturity, SDOW 2011 / 18 with Cattuto et al: Semantic Grounding of Tag Relatedness in Social Bookmarking Systems (ISWC 2008)
  • 6. Semantic Grounding
    • Compute Folksonomy -based
    • Relatedness (via Context Vectors)
    • Sim( , ) = 0.74
    WordNet Synset Taxonomy map Grounded similarity Sim True ( , ) = 0.59 (we used Jiang-Conrath dist.)
  • 7. Appendix: Music Genre Taxonomy learned from last.fm Music Genre Taxonomy learned from last.fm
  • 8. The Story 2011/10/23 Atzmueller et al: Towards Mining Semantic Maturity, SDOW 2011 / 18 Cooccurrence Fingerprints capture tag semantics Maturity Indicators Mining Maturity Profiles Evaluation
  • 9.
    • Frequency
      • Intuition: „the more often used, the more mature“
      • Resource frequency
      • User frequency
    Maturity Indicators (1) 2011/10/23 Atzmueller et al: Towards Mining Semantic Maturity, SDOW 2011 / 18
  • 10.
    • Centrality
      • "Importance" within co-occurrence network G = (V,E)
      • Intuition: the more important, the more mature
      • Degree, Closeness, Betweenness
    Maturity Indicators (2) 2011/10/23 Atzmueller et al: Towards Mining Semantic Maturity, SDOW 2011 / 18
  • 11. The Story 2011/10/23 Atzmueller et al: Towards Mining Semantic Maturity, SDOW 2011 / 18 Cooccurrence Fingerprints capture tag semantics Frequency / Centrality Properties as maturity indicators Mining Maturity Profiles Evaluation
  • 12. Pattern Mining using Subgroup Discovery
    • In a nutshell: "Find descriptions of subsets in the data, that differ significantly for the total population with respect to a target concept .
    • Pattern: Conjunctive description using tag properties representation as indicator rule:
    • description  target (with probability)
    2011/10/23 Atzmueller et al: Towards Mining Semantic Maturity, SDOW 2011 / 18
  • 13. Finding Maturity Patterns  Which patterns maximize target variable? 0.7 0.05 0.1 0.02 0.2 0.3 python 0.3 0.3 0.25 0.2 0.4 0.4 web2.0 0.6 0.43 0.7 0.6 0.7 0.8 web 0.5 0.35 0.3 0.4 0.2 0.1 semantic 1.0 0.2 0.12 0.1 0.1 0.4 games 1.0 0.02 0.04 0.01 0.3 0.2 game 0.8 0.4 0.1 0.2 0.9 0.13 java TARGET Clos Bet Deg Rfreq Ufreq Tag
  • 14. Mining Maturity Profiles – Target Variables
    • First: Compute most related tag t sim for each tag t (resource context)
    • WordNet Synonym Identification ( SYN )
      • Binary Target Variable
      • True if t sim is synonym of t, False otherwise
    • Grounded WordNet Maturity ( MAT )
      • Binary Target Variable
      • Based on taxonomic shortest path length
      • True if sim(t sim ,t) > 0.5, false otherwise
    2011/10/23 Atzmueller et al: Towards Mining Semantic Maturity, SDOW 2011 / 18
  • 15. Pattern Mining - Algorithm
    • Patterns similar to association rules
    • BUT: Fixed target concept (of interest), i.e., high maturity
    • Pattern mining – k best approach
      • Through space of descriptions (conjunctions of features)
      • Maximizing quality function, e.g., increase in target mean/share
      • Several efficient algorithms, we apply exhaustive one.
    2011/10/23 Atzmueller et al: Towards Mining Semantic Maturity, SDOW 2011 / 18
  • 16. The Story 2011/10/23 Atzmueller et al: Towards Mining Semantic Maturity, SDOW 2011 / 18 Cooccurrence Fingerprints capture tag semantics Frequency / Centrality Properties as maturity indicators Evaluation Discover indicator subgroups which maximize maturity target
  • 17. Social Bookmarking Data
    • Folksonomy crawled from Delicious in 2006, restricted to top 10,000 tags
      • 476,378 users
      • 10,000 tags
      • 12,660,470 resources
      • 101,491,722 tag assignments
    • Preprocessing & Filtering:
      • Filter tags without sufficiently similar partner (cos < 0.05)
      • Limit to tags with only one sense in WordNet
    • Nr. of finally considered tags: 1944
    2011/10/23 Atzmueller et al: Towards Mining Semantic Maturity, SDOW 2011 / 18
  • 18. Direct Correlation to Target Variables
    • No significant correlation of individual indicator with maturity target
    • Eventually higher correlation by combination of indicators
    •  consider subgroups
    2011/10/23 Atzmueller et al: Towards Mining Semantic Maturity, SDOW 2011 / 18 0.15 0.15 0.13 0.14 0.12 SYN 0.12 0.15 0.12 0.14 0.09 MAT Ufreq Rfreq Deg Clos Bet
  • 19. Results: Exemplary Patterns (1)
    • Target : Synonym Identification (SYN) ; mean = 0.13
    • Small groups with highest maturity (measured by increase of synonym discovery rate)
    • Larger group: degree centrality + user frequency
      • Synonmym discovery rate 128 % higher than for all tags
    2011/10/23 Atzmueller et al: Towards Mining Semantic Maturity, SDOW 2011 / 18
  • 20. Results: Exemplary Patterns (2) 2011/10/23 Atzmueller et al: Towards Mining Semantic Maturity, SDOW 2011 / 18
    • Target : WordNet Maturity (MAT) ; mean = 0.59
  • 21. Discussion & Implications
    • In general: centrality and frequency useful to assess maturity
    • Combined evidence of indicators leads to higher-quality patterns
    • Subgroup discovery generally useful technique
    • Open issues:
      • Further maturity indicators?
      • Alternative notions of maturity?
      • Temporal aspects
      • Mining of „immaturity“
    2011/10/23 Atzmueller et al: Towards Mining Semantic Maturity, SDOW 2011 / 18
  • 22. The Story 2011/10/23 Atzmueller et al: Towards Mining Semantic Maturity, SDOW 2011 / 18 Cooccurrence Fingerprints capture tag semantics Frequency / Centrality Properties as maturity indicators Combined evidence of indicators leads to higher-quality patterns Discover indicator subgroups which maximize maturity target Thanks! [email_address]
  • 23. Towards Mining Semantic Maturity in Social Bookmarking Systems Martin Atzmueller 1 , Dominik Benz 1 , Andreas Hotho 2 , Gerd Stumme 1 1 Knowledge and Data Engineering Group (KDE), University of Kassel, Germany 2 Data Mining and Information Retrieval Group University of Würzburg, Germany TexPoint fonts used in EMF. Read the TexPoint manual before you delete this box.: A A A A A A A A A A A