Your SlideShare is downloading. ×
0
TU Graz – Knowledge Management Institute




                  Of Categorizers and Describers:
                   An Evalu...
TU Graz – Knowledge Management Institute




                                               Introduction
            Lots ...
TU Graz – Knowledge Management Institute




                                                      Motivation

           ...
TU Graz – Knowledge Management Institute




                                           Presentation Overview
            ...
TU Graz – Knowledge Management Institute




                                                 Questions
            Can ta...
TU Graz – Knowledge Management Institute




                                Types of Tagging Motivations
                ...
TU Graz – Knowledge Management Institute




                                                 Terminology
            Folk...
‰
                                                                D898,-)0?: #D.4 # *.;., #u=8- #o=8A08 #=80/,:)(8= #=("08...
sidered a measure of the suitability of tags for this task. A  categorizer put in relation to the conditional entropy
    ...
TU Graz – Knowledge Management Institute




                     Approximating Tagging Motivation / 3
            Propert...
TU Graz – Knowledge Management Institute




                                           Experimental Setup
            Del...
Of Categorizers and Describers: An Evaluation of Quantitative Measures for Tagging Motivation
Of Categorizers and Describers: An Evaluation of Quantitative Measures for Tagging Motivation
Of Categorizers and Describers: An Evaluation of Quantitative Measures for Tagging Motivation
Of Categorizers and Describers: An Evaluation of Quantitative Measures for Tagging Motivation
Of Categorizers and Describers: An Evaluation of Quantitative Measures for Tagging Motivation
Of Categorizers and Describers: An Evaluation of Quantitative Measures for Tagging Motivation
Of Categorizers and Describers: An Evaluation of Quantitative Measures for Tagging Motivation
Of Categorizers and Describers: An Evaluation of Quantitative Measures for Tagging Motivation
Of Categorizers and Describers: An Evaluation of Quantitative Measures for Tagging Motivation
Of Categorizers and Describers: An Evaluation of Quantitative Measures for Tagging Motivation
Upcoming SlideShare
Loading in...5
×

Of Categorizers and Describers: An Evaluation of Quantitative Measures for Tagging Motivation

719

Published on

Slides to the presentation I gave in the "Tagging" Session at Hypertext 2010 in Toronto

Published in: Technology
0 Comments
3 Likes
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total Views
719
On Slideshare
0
From Embeds
0
Number of Embeds
0
Actions
Shares
0
Downloads
4
Comments
0
Likes
3
Embeds 0
No embeds

No notes for slide

Transcript of "Of Categorizers and Describers: An Evaluation of Quantitative Measures for Tagging Motivation"

  1. 1. TU Graz – Knowledge Management Institute Of Categorizers and Describers: An Evaluation of Quantitative Measures for Tagging Motivation Christian Körner, Roman Kern, Hans-Peter Grahsl, Markus Strohmaier Knowledge Management Institute and Know-Center Graz University of Technology, Austria Hypertext 2010, June 15th, 2010 1
  2. 2. TU Graz – Knowledge Management Institute Introduction Lots of research on folksonomies, their structure and the resulting dynamics What we do not know are the reasons and motivations users have when they tag. Question: Why do users tag? Hypertext 2010, June 15th, 2010 2
  3. 3. TU Graz – Knowledge Management Institute Motivation Knowledge about intuitions why users are tagging would help to answer a number of current research questions: What are possible improvements for tag recommendation? What are suitable search terms for items in these systems? How can we enhance ontology learning? … There already exist models for tagging motivation such as [Nov2009] and [Heckner2009]. BUT: These models rely on expert judgements Automatic measures for inference of tagging motivation are important! Hypertext 2010, June 15th, 2010 3
  4. 4. TU Graz – Knowledge Management Institute Presentation Overview • Research questions • Two types of tagging motivation • Approximating tagging motivation • Experiments and results – Quantitative Evaluation – Qualitative Evaluation Hypertext 2010, June 15th, 2010 4
  5. 5. TU Graz – Knowledge Management Institute Questions Can tagging motivation be approximated with statistical measures? What are measures which enable the inference if a given user has a certain motivation? Which of these measures perform best to differentiate between different types of tagging motivation? Does the distinction of the proposed tagging motivation types have an influence on the tagging process? Hypertext 2010, June 15th, 2010 5
  6. 6. TU Graz – Knowledge Management Institute Types of Tagging Motivations Categorizer Describer Goal later browsing later retrieval Change of vocabulary costly cheap Size of vocabulary limited open Tags subjective objective Tag reuse frequent rare Tag purpose mimicking taxonomy descriptive labels In the “real world” users are driven by a combination of both motivations – e.g. using tags as descriptive labels while maintaining a few categories [Körner2009] Hypertext 2010, June 15th, 2010 6
  7. 7. TU Graz – Knowledge Management Institute Terminology Folksonomies are usually represented by tripartite graphs with hyper edges Three different disjoint sets: – a set of users u ∈ U – a set of tags t ∈ T – a set of resources r ∈ R A folksonomy is defined as a set of annotations F ⊆ U x T x R Personomy is the reduction of a folksonomy F to a user u A tag assignment (tas) is one specific triple of one user u, tag t and resource r. Hypertext 2010, June 15th, 2010 7
  8. 8. ‰ D898,-)0?: #D.4 # *.;., #u=8- #o=8A08 #=80/,:)(8= #=("08#|R(t o sers Graz – Knowledge Management Institute be driven by a combina- orphan(u) =Tag/Resource Ratio n = 4.2 #=+,0-#/890;8:0./ #/()" #≤ n}, (trr) |T | TU in the real world would likely , Tu = {t||R(t)| /(E,#?8::()/# =0,:()0"0:8*08 #=.90(, |Tu | 1 ion of both motivations, for example following a description Tag/resource ratio relates the vocabu pproach to annotating most resources, while at the same ?>.:.;)8?><#?>? #?0,.,#?0@(* #?.*0:0-,#?.):A.*0.# to the total number of resources annot ?)0/:#?)098-<#)(-0?(,#)(*0;0./#)0;>:,#,8:(**0:(#,-0(/-(# Approximating Tagging Motivation / 1 ime maintaining a few categories. Table 2 gives an overview 4.4 Conditional Tag Entropy (cte) ,(.# Describers, who use a variety of differen ,>.-BE89( #,>.?#,.-0(:<#,:.-B#,:)((:8):#:-?8#:(=?*8:(#:>- # f different intuitions about the two types of tagging moti- For categorizers, useful tags shouldscore higher v :.))(/:# sources, can be expected to be maximally ation. +:0*0:0(, #90"(.#E8)# :)89(* #:+:.)08* #than categorizers, who use fewer assi with #:<?( # inative sure :9regard to the resources they are tag :<?.;)8?>< # E(4567#E(4"(,0;/#E(4"(9#E.=(/#E.)*"#E:A#tags This would allow categorizers to effectively use like ited vocabulary, a categorizer would Goal Based later browsing on different intuitions F(0:;(0,: score browsing.measureobservation can be w Categorizer Describer later retrieval various measures for the describer e igation and on this This than a differentiation were developed: oretically unlimited vocabulary. Equatio Change of vocabulary Size of vocabulary costly limited cheap open to develop a measure for tagging motivation when taggingmula used for this calculation entropy Ru as an encoding process, where where can Figure 1: Tag cloud example of a categorizer. Fre- Tags subjective objective Tag reuse frequent rare quency among tags is balanced, annotatedtags a user u sideredsources whichthe suitability of by for this a measure of were a potential indicator categorizer would have aid for navigation. maint sure set as an a strong incentive to descriptivefor using the tag does not reflect on is the average n • Tag purpose mimicking taxonomy labels Tag/Resource Ratio (trr) tag entropy (or information value) in her tag cloud. tags per post. words, a categorizer would want the tag-frequency a Table 2: Intuitions many tags does a user and expected to be represented by values closer to 0 because – How about Categorizers use? De- be distributed as possible in order for her to be use cribers navigational introduce noise tags would |Tu of litt orphaned tags wouldaid. Otherwise, to their personal tax- trr(u) = be | onomy.browsing. A describer on the otherwould |Rurepre- For a describer’s tag vocabulary, it hand be | would h 4. • Orphaned Tag Ratio MEASURES FOR TAGGING sented interest incloser to 1 due to the fact thatas tags are by values maintaining high tag entropy describers tag resources in a verbose and descriptive way, and do not – How many tags of a users vocabulary are order to Orphaned suitability vocabulary. introduction measure fewTag Ratio 4.3 of orphaned resources? for navigation at all. mind the In attached to onlythetags to their of tags to MOTIVATION resources,To capture an entropy-based measure ı r we develop tag reuse, the ‰ orphan tag for In the following measures which capture properties of the motivation,| usingthe degreetagswhich |R(tmax )|reso acterizes the set of to and the set of o |Tu users prod • Conditional Tag Entropy o orphan(u) = Orphaned {t||R(t)| ≤ wo types of tagging motivation (Table 2) are introduced. random |Tu | , Tu = to calculaten}, n = areentropy. variables tags are tags that assigne conditional 100 employs tagsand encode resources, the conditional only, to therefore are used infrequently. (2) 4.1 Terminology – How well does a user “encode” resources with his tags? the percentage of items in a should ratio captures reflect the effectiveness of this encoding pro Folksonomies are usually represented by tripartite 4.4 Conditional Tag Entropy (cte) tags. In equ graphs that represent such orphaned For categorizers,set of orphaned X maximally discrim- with hyper edges. Such graphs hold three finite, disjoint sets X tags the useful tags should be in a user’s tag vo H(R|T ) = − p(r, t)log2 (p(r|t)) which are 1) a set of users u ∈ U , 2) a set of resources r ∈ R with regardthreshold n. Thethey are assigned to. inative on a to the resources threshold n is deriv This would allow categorizers tor∈Rstyle inuse tags tmax de nd 3) a set of tags t ∈ T annotating resources R. 2010, June 15th, 2010individual tagging t∈T effectively which for nav- Hypertext A folkson- T × R The was used the observation can be exploited joint probability p(r, t) depends on the dis my as a whole is defined as the annotations F ⊆ U ×igation and browsing. This most. |Ru (t)| denotes the n 8 to develop a measure for tagging motivation when viewing
  9. 9. sidered a measure of the suitability of tags for this task. A categorizer put in relation to the conditional entropy free from intersections. On the other hand, descr categorizer would have a strong incentive to maintain high ideal categorizer: TU Graz – Knowledge Management Institute not care about a possibly high overlap factor si tag entropy (or information value) in her tag cloud. In other words, a categorizer would want the tag-frequency as equally not use tags for navigation but instead aim to b distributed as possible in order for her to be useful as a later retrieval. = H(R|T ) − Hopt (R|T ) cte Hopt (R|T ) Approximating Tagging Motivation / 2 navigational aid. Otherwise, tags would be of little use in browsing. A describer on the other hand would have little 4.6 Tag/Title Intersection Ratio (ttr) 4.5 Overlap Factor interest in maintaining high tag entropy as tags are not used In order to address the objectiveness or subje When users assign more than one tag per resource o for navigation at all. tags, we introduce the tag/title intersection rat • Overlap Factor In order to measure the suitability of tags to navigate resources, we develop an entropy-based measure for tagging age, it is possible that they produce an overlap (i.e. in an indicator how likely users choose tags from t tion with regard to the resource sets of corresponding The overlap factor (e.g. the title of a web phenomen a resource’s title allows to measure this page). T motivation, using the set of tags andas discriminative as – Are tags used the set of resources categories? relating the number of all the intersectiontotal num is calculated by taking resources to the of the t random variables to calculate conditional entropy. If a user tag assignments of a user andspecific user. follows: resource’s title words of a is defined as At first, employs tags to encode resources, the conditional entropy titles occurring in a personomy are tokenized t should reflect the effectiveness of this encoding process: |R | set of title words T Wu . = 1 − weufiltered the ta overlap Then XX |T ASu | words using the stop-word list which is packag H(R|T ) = − p(r, t)log2 (p(r|t)) (3) Snowball1 stemmer. For normalization purpose • Tag/Title Intersection Ratio (ttr) resulting absolute intersection size toto beca r∈R t∈T We can speculate that categorizers would be interes keeping this overlap relatively low in order the a the The joint probability p(r, t) depends on the choose words produce discriminative categories, i.e. categories th – How likely does a user distribution the set of title words. from the title as tags? |Tu ∩ T Wu | ttr = |T Wu | Categorizer Describer 4.7 Properties ofMeasure Presented Meas Proposed the Goal later browsing later retrieval Change of vocabulary costly cheap When examining the five presented measures, Size of vocabulary limited open serve that the measures Ratio Tag/Resource focus on tagging behav Tags subjective objective as opposed to Tag/Titlesemantics of tags. This ma the Intersection Ratio Tag reuse frequent rare troduced measures independent of particular lan Orphaned Tag Ratio / Cond. Tag Entropy Tag purpose mimicking taxonomy advantage of this is that the approach is not in descriptive labels Overlap Factor special characters, internet slang or user specific Hypertext 2010, June 15th“to_read”). In addition, the measures evaluat , 2010 properties of a single user personomy only; there 9
  10. 10. TU Graz – Knowledge Management Institute Approximating Tagging Motivation / 3 Properties of the developed measures: • Agnostic to the semantics of used language • Evaluate behavior of single user (as opposed to complete folksonomy) – no comparison to the complete folksonomy necessary • Inspect the usage of tags and NOT their semantic meaning – How often are tags used? – How many tags are used on average to annotate a resource? – How good does a user “encode” her resources with tags? Hypertext 2010, June 15th, 2010 10
  11. 11. TU Graz – Knowledge Management Institute Experimental Setup Delicious dataset – part of a collection of tagging datasets which we crawled from May to June 2009 – Captured folksonomy consists of: • 896 users • 184,746 tags • 1,089,653 resources Requirements for the dataset – Holding complete personomies • all tags and resources which were publicly available – Chronological order of the posts should be conserved • To capture changes in tagging behavior – “Mostly inactive” users who do not have a lot of annotated resources should be neglected • The lower bound of tagged resources was 1000 in the case of the Delicious dataset Hypertext 2010, June 15th, 2010 11
  1. A particular slide catching your eye?

    Clipping is a handy way to collect important slides you want to go back to later.

×