• Share
  • Email
  • Embed
  • Like
  • Save
  • Private Content
Stop thinking, start tagging - Tag Semantics emerge from Collaborative Verbosity
 

Stop thinking, start tagging - Tag Semantics emerge from Collaborative Verbosity

on

  • 3,415 views

Recent research provides evidence for the presence of emergent semantics in collaborative tagging systems. While several methods have been proposed, little is known about the factors that influence ...

Recent research provides evidence for the presence of emergent semantics in collaborative tagging systems. While several methods have been proposed, little is known about the factors that influence the evolution of semantic structures in these systems. A natural hypothesis is that the quality of the emergent semantics depends on the pragmatics of tagging: Users with certain usage patterns might contribute more to the resulting semantics than others. In this work, we propose several measures which enable a pragmatic differentiation of taggers by their degree of contribution to emerging semantic structures. We distinguish between categorizers, who typically use a small set of tags as a replacement for hierarchical classification schemes, and describers, who are annotating resources with a wealth of freely associated, descriptive keywords. To study our hypothesis, we apply semantic similarity measures to 64 different partitions of real-world and large-scale folksonomy containing different ratios of categorizers and describers. Our results not only show that ‘verbose’ taggers are most useful for the emergence of tag semantics, but also that a subset containing only 40% of the most ‘verbose’ taggers can produce results that match and even outperform the semantic precision obtained from the whole dataset. Moreover, the results suggest that there exists a causal link between the pragmatics of tagging and resulting emergent semantics. This work is relevant for designers and analysts of tagging systems interested (i) in fostering the semantic development of their platforms, (ii) in identifying users introducing “semantic noise”, and (iii) in learning ontologies.

Statistics

Views

Total Views
3,415
Views on SlideShare
3,395
Embed Views
20

Actions

Likes
8
Downloads
88
Comments
2

2 Embeds 20

http://www.slideshare.net 19
http://twittertim.es 1

Accessibility

Categories

Upload Details

Uploaded via as Microsoft PowerPoint

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel

12 of 2 previous next

  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
  • Here is a paper on 'trend detection in folksonomies' - maybe this can provide some interesting pointers for you:
    http://www.kde.cs.uni-kassel.de/stumme/papers/2006/hotho2006trend.pdf
    Are you sure you want to
    Your message goes here
    Processing…
  • Astute observations and interesting measures.

    What I'm still looking for are measurements for vocabulary drift and jargon evolution over time. Does anyone know who is doing research in this area?
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment

    Stop thinking, start tagging - Tag Semantics emerge from Collaborative Verbosity Stop thinking, start tagging - Tag Semantics emerge from Collaborative Verbosity Presentation Transcript

    • Stop thinking, start tagging: Tag Semantics arise from Collaborative Verbosity Christian Körner 1 , Dominik Benz 2 , Andreas Hotho 3 , Markus Strohmaier 1 , Gerd Stumme 2 1 Knowledge Management Institute and Know Center, Graz University of Technology, Austria 2 Knowledge and Data Engineering Group (KDE), University of Kassel, Germany 3 Data Mining and Information Retrieval Group University of Würzburg, Germany
    • Where do Semantics come from?
      • Semantically annotated content is the „fuel“ of the next generation World Wide Web – but where is the petrol station?
      • Expert-built  expensive
      • Evidence for emergent semantics in Web2.0 data  Built by the crowd!
       Which factors influence emergence of semantics?  Do certain users contribute more than others?
    • The Story Emergent Tag Semantics Pragmatics of tagging Semantic Implications of Tagging Pragmatics Conclusions
    • Emergent Tag Semantics
      • tagging is a simple and intuitive way to organize all kinds of resources
      • uncontrolled vocabulary, tags are „just strings “
      • formal model: folksonomy F = (U, T, R, Y)
        • Users U, Tags T, Resources R
        • Tag assignments Y  (U  T  R)
      • evidence of emergent semantics
        • Tag similarity measures can identify e.g. synonym tags ( web2.0, web_two )
    • Tag Similarity Measures: Tag Context Similarity
        • Tag Context Similarity is a scalable and precise tag similarity measure [Cattuto2008,Markines2009]:
          • Describe each tag as a context vector
          • Each dimension of the vector space correspond to another tag ; entry denotes co-occurrence count
          • Compute similar tags by cosine similarity
      design software blog web programming … JAVA  Will be used as indicator of emergent semantics! 50 10 1 30 5
    • Assessing the Quality of Tag Semantics JCN(t,t sim ) = 3.68 TagCont(t,t sim ) = 0.74 Folksonomy Tags = tag = synset WordNet Hierarchy Mapping Average JCN(t,t sim ) over all tags t: „ Quality of semantics“
    • The Story Pragmatics of tagging Semantic Implications of Tagging Pragmatics Conclusions Tag Similarity measures can capture emergent tag semantics
    • Tagging motivation
      • Evidence of different ways HOW users tag (Tagging Pragmatics )
      • Broad distinction by tagging motivation [Strohmaier2009]:
      • „ Categorizers “…
      • use a small controlled tag vocabulary
      • goal: „ontology-like“ categorization by tags, for later browsing
      • tags a replacement for folders
      • „ Describers “…
      • tag „verbously“ with freely chosen words
      • vocabulary not necessarily consistent (synomyms, spelling variants, …)
      • goal: describe content, ease retrieval
      donuts duff marge beer bart barty Duff-beer bev alc nalc beer wine
    • Tagging Pragmatics: Measures
      • How to disinguish between two types of taggers?
      • Intuition: Describers use open set of many tags, Categorizers use small set of controlled tags:
      • Vocabulary size:
      • Tag / Resource ratio:
      • Average # tags per post:
      high low
    • Tagging Pragmatics: Measures
      • Next Intuition: Describers don‘t care about „abandoned“ tags, Categorizers do
      • Orphan ratio:
      • R(t): set of resources tagged by user u with tag t
      high low
    • Tagging pragmatics: Limitations of measures
      • Real users: no „perfect“ Categorizers / Describers, but „mixed“ behaviour
      • Possibly influenced by user interfaces / recommenders
      • Measures are correlated
      • But: independent of semantics ; measures capture usage patterns
    • The Story Semantic Implications of Tagging Pragmatics Conclusions Tag Similarity measures can capture emergent tag semantics Measures of tagging pragmatics differentiate users by tagging motivation
    • Influence of Tagging Pragmatics on Emergent Semantics
      • Idea: Can we learn the same (or even better) semantics from the folksonomy induced by a subset of describers / categorizers?
      Extreme Categorizers Extreme Describers Complete folksonomy Subset of 30% categorizers = user
    • Experimental setup
      • Apply pragmatic measures vocab, trr, tpp, orphan to each user
      • Systematically create „ sub-folksonomies “ CF i / DF i by subsequently adding i % of Categorizers / Describers (i = 1,2,…,25,30,…,100)
      • Compute similar tags based on each subset (TagContext Sim.)
      • Assess (semantic) quality of similar tags by avg. JCN distance
      TagCont(t,t sim )= … JCN(t,t sim )= … DF 20 CF 5
    • Dataset
      • From Social Bookmarking Site Delicious in 2006  ORIGINAL
      • Two filtering steps (to make measures more meaningful):
        • Restrict to top 10.000 tags  FULL
        • Keep only users with > 100 resources  MIN100RES
      140,333,714 18,782,132 667,128 2,454,546 ORIGINAL 96,298,409 12,125,176 100,363 9,944 MIN100RES 117,319,016 14,567,465 511,348 10,000 FULL |Y| |R| |U| |T| dataset
    • Results – adding Describers (DF i ) more describers better semantics Almost all sub-folksonomies are better than random-picked ones 40% of describers according to trr outperform complete data! Optimal performance for 70% describers (trr)
    • Results – adding Categorizers (CF i ) better semantics more categorizers Almost all sub-folksonomies are worse than random-picked ones Global optimum for 90% categorizers (tpp)  removing 10% most extreme describers! (Spammers?)
    • The Story Tag Similarity measures can capture emergent tag semantics S ub-folksonomies introduced by measures of pragmatics show different semantic qualities Conclusions Measures of tagging pragmatics differentiate users by tagging motivation
    • Summary & Conclusions
      • Introduction of measures of users‘ tagging motivation (Categorizers vs. Describers)
      • Evidence for causal link between tagging pragmatics (HOW people use tags) and tag semantics (WHAT tags mean)
      • „ Mass matters“ for „wisdom of the crowd“, but composition of crowd makes a difference („ Verbosity “ of describers in general better, but with a limitation)
      • Relevant for tag recommendation and ontology learning algorithms
    • Guess who‘s a Categorizer from the authors 
    • Thanks for the attention! Questions? Be verbous  Tag Similarity measures can capture emergent tag semantics S ub-folksonomies introduced by measures of pragmatics show different semantic qualities Evidende of causal link between pragmatics and semantics of tagging! [email_address] [email_address] Measures of tagging pragmatics differentiate users by tagging motivation
    • References
      • [Cattuto2008] Ciro Cattuto, Dominik Benz, Andreas Hotho, Gerd Stumme: Semantic Grounding of Tag Relatedness in Social Bookmarking Systems . In: Proc. 7 th Intl. Semantic Web Conference (2008), p. 615-631
      • [Markines2009] Benjamin Markines, Ciro Cattuto, Filippo Menczer, Dominik Benz, Andreas Hotho, Gerd Stumme: Evaluating Similarity Measures for Emergent Semantics of Social Tagging . In: Proc. 18 th Intl. World Wide Web Conference (2009), p.641-641
      • [Strohmaier2009] Markus Strohmaier, Christian Körner, Roman Kern: Why do users tag? Detecting users‘ motivation for tagging in social tagging systems . Technical Report, Knowledge Management Institute – Graz University of Technology (2009)