Of Categorizers and Describers: An Evaluation of Quantitative Measures for Tagging Motivation
Upcoming SlideShare
Loading in...5
×
 

Like this? Share it with your network

Share

Of Categorizers and Describers: An Evaluation of Quantitative Measures for Tagging Motivation

on

  • 992 views

Slides to the presentation I gave in the "Tagging" Session at Hypertext 2010 in Toronto

Slides to the presentation I gave in the "Tagging" Session at Hypertext 2010 in Toronto

Statistics

Views

Total Views
992
Views on SlideShare
992
Embed Views
0

Actions

Likes
3
Downloads
3
Comments
0

0 Embeds 0

No embeds

Accessibility

Categories

Upload Details

Uploaded via as Adobe PDF

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment

Of Categorizers and Describers: An Evaluation of Quantitative Measures for Tagging Motivation Presentation Transcript

  • 1. TU Graz – Knowledge Management Institute Of Categorizers and Describers: An Evaluation of Quantitative Measures for Tagging Motivation Christian Körner, Roman Kern, Hans-Peter Grahsl, Markus Strohmaier Knowledge Management Institute and Know-Center Graz University of Technology, Austria Hypertext 2010, June 15th, 2010 1
  • 2. TU Graz – Knowledge Management Institute Introduction Lots of research on folksonomies, their structure and the resulting dynamics What we do not know are the reasons and motivations users have when they tag. Question: Why do users tag? Hypertext 2010, June 15th, 2010 2
  • 3. TU Graz – Knowledge Management Institute Motivation Knowledge about intuitions why users are tagging would help to answer a number of current research questions: What are possible improvements for tag recommendation? What are suitable search terms for items in these systems? How can we enhance ontology learning? … There already exist models for tagging motivation such as [Nov2009] and [Heckner2009]. BUT: These models rely on expert judgements Automatic measures for inference of tagging motivation are important! Hypertext 2010, June 15th, 2010 3
  • 4. TU Graz – Knowledge Management Institute Presentation Overview • Research questions • Two types of tagging motivation • Approximating tagging motivation • Experiments and results – Quantitative Evaluation – Qualitative Evaluation Hypertext 2010, June 15th, 2010 4
  • 5. TU Graz – Knowledge Management Institute Questions Can tagging motivation be approximated with statistical measures? What are measures which enable the inference if a given user has a certain motivation? Which of these measures perform best to differentiate between different types of tagging motivation? Does the distinction of the proposed tagging motivation types have an influence on the tagging process? Hypertext 2010, June 15th, 2010 5
  • 6. TU Graz – Knowledge Management Institute Types of Tagging Motivations Categorizer Describer Goal later browsing later retrieval Change of vocabulary costly cheap Size of vocabulary limited open Tags subjective objective Tag reuse frequent rare Tag purpose mimicking taxonomy descriptive labels In the “real world” users are driven by a combination of both motivations – e.g. using tags as descriptive labels while maintaining a few categories [Körner2009] Hypertext 2010, June 15th, 2010 6
  • 7. TU Graz – Knowledge Management Institute Terminology Folksonomies are usually represented by tripartite graphs with hyper edges Three different disjoint sets: – a set of users u ∈ U – a set of tags t ∈ T – a set of resources r ∈ R A folksonomy is defined as a set of annotations F ⊆ U x T x R Personomy is the reduction of a folksonomy F to a user u A tag assignment (tas) is one specific triple of one user u, tag t and resource r. Hypertext 2010, June 15th, 2010 7
  • 8. ‰ D898,-)0?: #D.4 # *.;., #u=8- #o=8A08 #=80/,:)(8= #=("08#|R(t o sers Graz – Knowledge Management Institute be driven by a combina- orphan(u) =Tag/Resource Ratio n = 4.2 #=+,0-#/890;8:0./ #/()" #≤ n}, (trr) |T | TU in the real world would likely , Tu = {t||R(t)| /(E,#?8::()/# =0,:()0"0:8*08 #=.90(, |Tu | 1 ion of both motivations, for example following a description Tag/resource ratio relates the vocabu pproach to annotating most resources, while at the same ?>.:.;)8?><#?>? #?0,.,#?0@(* #?.*0:0-,#?.):A.*0.# to the total number of resources annot ?)0/:#?)098-<#)(-0?(,#)(*0;0./#)0;>:,#,8:(**0:(#,-0(/-(# Approximating Tagging Motivation / 1 ime maintaining a few categories. Table 2 gives an overview 4.4 Conditional Tag Entropy (cte) ,(.# Describers, who use a variety of differen ,>.-BE89( #,>.?#,.-0(:<#,:.-B#,:)((:8):#:-?8#:(=?*8:(#:>- # f different intuitions about the two types of tagging moti- For categorizers, useful tags shouldscore higher v :.))(/:# sources, can be expected to be maximally ation. +:0*0:0(, #90"(.#E8)# :)89(* #:+:.)08* #than categorizers, who use fewer assi with #:<?( # inative sure :9regard to the resources they are tag :<?.;)8?>< # E(4567#E(4"(,0;/#E(4"(9#E.=(/#E.)*"#E:A#tags This would allow categorizers to effectively use like ited vocabulary, a categorizer would Goal Based later browsing on different intuitions F(0:;(0,: score browsing.measureobservation can be w Categorizer Describer later retrieval various measures for the describer e igation and on this This than a differentiation were developed: oretically unlimited vocabulary. Equatio Change of vocabulary Size of vocabulary costly limited cheap open to develop a measure for tagging motivation when taggingmula used for this calculation entropy Ru as an encoding process, where where can Figure 1: Tag cloud example of a categorizer. Fre- Tags subjective objective Tag reuse frequent rare quency among tags is balanced, annotatedtags a user u sideredsources whichthe suitability of by for this a measure of were a potential indicator categorizer would have aid for navigation. maint sure set as an a strong incentive to descriptivefor using the tag does not reflect on is the average n • Tag purpose mimicking taxonomy labels Tag/Resource Ratio (trr) tag entropy (or information value) in her tag cloud. tags per post. words, a categorizer would want the tag-frequency a Table 2: Intuitions many tags does a user and expected to be represented by values closer to 0 because – How about Categorizers use? De- be distributed as possible in order for her to be use cribers navigational introduce noise tags would |Tu of litt orphaned tags wouldaid. Otherwise, to their personal tax- trr(u) = be | onomy.browsing. A describer on the otherwould |Rurepre- For a describer’s tag vocabulary, it hand be | would h 4. • Orphaned Tag Ratio MEASURES FOR TAGGING sented interest incloser to 1 due to the fact thatas tags are by values maintaining high tag entropy describers tag resources in a verbose and descriptive way, and do not – How many tags of a users vocabulary are order to Orphaned suitability vocabulary. introduction measure fewTag Ratio 4.3 of orphaned resources? for navigation at all. mind the In attached to onlythetags to their of tags to MOTIVATION resources,To capture an entropy-based measure ı r we develop tag reuse, the ‰ orphan tag for In the following measures which capture properties of the motivation,| usingthe degreetagswhich |R(tmax )|reso acterizes the set of to and the set of o |Tu users prod • Conditional Tag Entropy o orphan(u) = Orphaned {t||R(t)| ≤ wo types of tagging motivation (Table 2) are introduced. random |Tu | , Tu = to calculaten}, n = areentropy. variables tags are tags that assigne conditional 100 employs tagsand encode resources, the conditional only, to therefore are used infrequently. (2) 4.1 Terminology – How well does a user “encode” resources with his tags? the percentage of items in a should ratio captures reflect the effectiveness of this encoding pro Folksonomies are usually represented by tripartite 4.4 Conditional Tag Entropy (cte) tags. In equ graphs that represent such orphaned For categorizers,set of orphaned X maximally discrim- with hyper edges. Such graphs hold three finite, disjoint sets X tags the useful tags should be in a user’s tag vo H(R|T ) = − p(r, t)log2 (p(r|t)) which are 1) a set of users u ∈ U , 2) a set of resources r ∈ R with regardthreshold n. Thethey are assigned to. inative on a to the resources threshold n is deriv This would allow categorizers tor∈Rstyle inuse tags tmax de nd 3) a set of tags t ∈ T annotating resources R. 2010, June 15th, 2010individual tagging t∈T effectively which for nav- Hypertext A folkson- T × R The was used the observation can be exploited joint probability p(r, t) depends on the dis my as a whole is defined as the annotations F ⊆ U ×igation and browsing. This most. |Ru (t)| denotes the n 8 to develop a measure for tagging motivation when viewing
  • 9. sidered a measure of the suitability of tags for this task. A categorizer put in relation to the conditional entropy free from intersections. On the other hand, descr categorizer would have a strong incentive to maintain high ideal categorizer: TU Graz – Knowledge Management Institute not care about a possibly high overlap factor si tag entropy (or information value) in her tag cloud. In other words, a categorizer would want the tag-frequency as equally not use tags for navigation but instead aim to b distributed as possible in order for her to be useful as a later retrieval. = H(R|T ) − Hopt (R|T ) cte Hopt (R|T ) Approximating Tagging Motivation / 2 navigational aid. Otherwise, tags would be of little use in browsing. A describer on the other hand would have little 4.6 Tag/Title Intersection Ratio (ttr) 4.5 Overlap Factor interest in maintaining high tag entropy as tags are not used In order to address the objectiveness or subje When users assign more than one tag per resource o for navigation at all. tags, we introduce the tag/title intersection rat • Overlap Factor In order to measure the suitability of tags to navigate resources, we develop an entropy-based measure for tagging age, it is possible that they produce an overlap (i.e. in an indicator how likely users choose tags from t tion with regard to the resource sets of corresponding The overlap factor (e.g. the title of a web phenomen a resource’s title allows to measure this page). T motivation, using the set of tags andas discriminative as – Are tags used the set of resources categories? relating the number of all the intersectiontotal num is calculated by taking resources to the of the t random variables to calculate conditional entropy. If a user tag assignments of a user andspecific user. follows: resource’s title words of a is defined as At first, employs tags to encode resources, the conditional entropy titles occurring in a personomy are tokenized t should reflect the effectiveness of this encoding process: |R | set of title words T Wu . = 1 − weufiltered the ta overlap Then XX |T ASu | words using the stop-word list which is packag H(R|T ) = − p(r, t)log2 (p(r|t)) (3) Snowball1 stemmer. For normalization purpose • Tag/Title Intersection Ratio (ttr) resulting absolute intersection size toto beca r∈R t∈T We can speculate that categorizers would be interes keeping this overlap relatively low in order the a the The joint probability p(r, t) depends on the choose words produce discriminative categories, i.e. categories th – How likely does a user distribution the set of title words. from the title as tags? |Tu ∩ T Wu | ttr = |T Wu | Categorizer Describer 4.7 Properties ofMeasure Presented Meas Proposed the Goal later browsing later retrieval Change of vocabulary costly cheap When examining the five presented measures, Size of vocabulary limited open serve that the measures Ratio Tag/Resource focus on tagging behav Tags subjective objective as opposed to Tag/Titlesemantics of tags. This ma the Intersection Ratio Tag reuse frequent rare troduced measures independent of particular lan Orphaned Tag Ratio / Cond. Tag Entropy Tag purpose mimicking taxonomy advantage of this is that the approach is not in descriptive labels Overlap Factor special characters, internet slang or user specific Hypertext 2010, June 15th“to_read”). In addition, the measures evaluat , 2010 properties of a single user personomy only; there 9
  • 10. TU Graz – Knowledge Management Institute Approximating Tagging Motivation / 3 Properties of the developed measures: • Agnostic to the semantics of used language • Evaluate behavior of single user (as opposed to complete folksonomy) – no comparison to the complete folksonomy necessary • Inspect the usage of tags and NOT their semantic meaning – How often are tags used? – How many tags are used on average to annotate a resource? – How good does a user “encode” her resources with tags? Hypertext 2010, June 15th, 2010 10
  • 11. TU Graz – Knowledge Management Institute Experimental Setup Delicious dataset – part of a collection of tagging datasets which we crawled from May to June 2009 – Captured folksonomy consists of: • 896 users • 184,746 tags • 1,089,653 resources Requirements for the dataset – Holding complete personomies • all tags and resources which were publicly available – Chronological order of the posts should be conserved • To capture changes in tagging behavior – “Mostly inactive” users who do not have a lot of annotated resources should be neglected • The lower bound of tagged resources was 1000 in the case of the Delicious dataset Hypertext 2010, June 15th, 2010 11
  • 12. TU Graz – Knowledge Management Institute Correlation Between Measures Although all measures were developed based on different intuitions with regards to the underlying motivation some of them correlate to a great extend No surprise because both measures Pairwise Correlation of Measures are based on tag distribution Highest correlation: ***** * * * * * ********* * * * ** * ** * * ** * * **** ** * * ** ** * * * * * ** * max: 0.97 * * *********** * ** * ** * ************* ** * * * * * * ************** **** ** * * * * * **** *** **** * *** * * ************ ******* * *************** **** ************* * * * ** * * ******** * ****************** * * * * **** * * * * * * * ************** ** *** **** * * * *** * ********************* * * *********** ******* ** ** ** ** * * ** ****************** * * ** ******** ********** *********** * ** * ******* * * ****************** * ************************* ** * * * *** ** ************* * * * * * * * ** * ** *************** ** * ********* *********** * **** ** ******* * ******** * ****** *** * ************* *** * *** ** ******* * * ********************* * * ** ************* * * * ** * ************************ ****** * * * * * *** *************** *********** * *** * * * * * * *** *** ***** * *** *** *** * * ************ ** ***** * *** * ********** ************* * * ******* ** **************** * ** ****** ********* * * ** * * ** * ***** * * * * ******** ********************* ** * * * * ** * * ** * * * ********* ** * * ** * ** * ** * *** * * *** Orphaned * ********** ********* ****** * *** * **** ** ** ********* * * ****** * ********* ** ****** * * * ** ** * * * ** ** ************ *** * ******* ************ ****** ** ********* ** ******* * ***** *** ******* ******* ****** * ********* * * *** * **** * ** ** * **** ** *** ** ** * ********** ********** * * * * ***** ***** ********** * * * * * ******* * **** * ** * *** * *** * ***** * ***** ********* * * * * ** * * ** * Tags ******* ****** ********* *** * **** ***** * **** ** **** * ****** * *** * * ** * * * * *** ** * * * * *** ** ** * *** *** *** * **** *** ***** ******* ** ****** ** * *** * **** * ** **** *** * ** * ** * * *** ** * ** ***** * * * * * – Cond. Tag Entropy and Orphaned **** ** * ** * ** ** ** * * ** * ** * ***** * ** * ** ** ** ** ** * **** ** * * ** * * * ** * * **** ** * ** * * ** ** * ** * * * *** * * min: 0 * * * * * * * max: 1.47 * * * Tags Conditional * ** * * * ** ** * * * * ********** * * * *** ***** ******** * * ** ** * * *************** * * * * * * * * * ** * * * ** * * * ** * ** * * **** ****** *** * * * ** * ** ** * ** * *** * * * * * * * * * * * * * **** * *** * * ********* ** * * ** ** ******** * * * * ***** *********** * * * **** * 0.89 Tag ***** * ********** * ******* * * ***** ** ************ ** *************** * * *** * ** * ******** ******* * * * ********************* *** * * ** * ** ** *** * * * * * * ************************ * * ****************** * * ******* * ****** * * ** ******* ****************** * ** * * * **** * ** ** * ************ * * * **** ** * * ******* ************************* – Tag/Title Intersection Ratio and *********** ** ** * ******************************* Entropy ************* * * ***** ** ******** * * * * **** * ************* * * ******** * **** *** * ** * *********** **** ***************** ****** * ** * * *** * * * ******* *********** ** * *** **************** * ******* **** ********** *** * ***** ** * ********* * ** ** ***** * * * ** *** * *** *** ***** * ******** ** *** ****** **** ******* *** ** ** * **** * ******* *** ***** *** ****************************** ** * * **** * * * ************ * ** * * * * * ** * * **** *** ** * **** ** * ** * * *** ****** * ******* ********** * * * * * ** **** ** ** * *** ** ***** ** ** * ** * ** ** ** **** ** ** ******** *** ******* ** * ** * * * * ** * * ** * *** ** * ** * * *** Tag/Resource Ratio min: 0.06 * * * * max: 0.81 * * Tag/Title * * 0.64 0.72 Intersection * ** * * ** **** ** * * * ***** *** Ratio * **** **** *** * * *** * ** * ** ** *** * *** ** * ***** * ** ******** ** ******** * ******* * ************ * ******* * ****** * *** *** ********* * ** *************** * *** ** ******** * * *********** * * * ********* **** * ************** **** ********* ** *** ******* * *********** * ** ** * ****** **** * * * ** ************ * ** * ****************** * ****** **** **** ***** ***** ****** * ** * * ** ***************************** * * ** **** * * *** *********** ****************** *** min: 0 **** * ** * **** ** *** ** *** * ** *** * ********* * ** * ****************** *************** * max: 3.13 ** Interesting because Tag/Resource ** * ** ** * * * ** * * * 0.63 0.74 0.94 ** *** * *** * Ratio ******** ** * **** ******** * * ******** ************* ************** *** ** ** * *** * measures have different intuitions in ** * * *** ********** * * ****** * **** ************* * ** * * ** ** ******************** * *************** * * ** * * *** * * ****************************** * * * * * *** ** ** * **** ************************** * * * **************** *** * min: 0 ****** ** ********* *** * *************************************** ***** *** * ** * * *** **** ** * * * max: 0.93 mind 0.49 0.42 0.74 0.71 Overlap Factor min: 0 Hypertext 2010, June 15th, 2010 12
  • 13. TU Graz – Knowledge Management Institute Qualitative Evaluation / 1 For the qualitative evaluation we conducted a human subject study with 6 participants Study participants were given personomies and asked to classify each personomy into one of the two tagging motivation types – categorizer – describer Result – moderate agreement Cohen’s kappa = 0.51 [Cohen1960] Reasons – Users were only given a fraction of each personomy – Task is quite subjective and complex Hypertext 2010, June 15th, 2010 13
  • 14. TU Graz – Knowledge Management Institute Qualitative Evaluation / 2 1.0 Accuracy of Evaluated Measures Highest agreement – Tag/Resource Ratio +74.2% +77.4% 0.8 +56.5% +66.1% – Tag/Title Intersection Ratio +48.4% – Overlap Factor 0.6 Accuracy ± 0.0% 0.4 Lowest agreement – Orphaned Tag Ratio 0.2 – Conditional Tag Entropy 0.0 --> General agreement Random Intersection Ratio Ratio Entropy Overlap Factor Orphaned Tags Conditional Tag Tag/Title Tag/Resource between measures and human judgement Hypertext 2010, June 15th, 2010 14
  • 15. TU Graz – Knowledge Management Institute Quantitative Evaluation / 1 To assess wether the distinction between describers and categorizers has an observable impact during tagging For this purpose: “Recommender Evaluation” Intuition: – A user who is motivated by categorization prefers tag recommendation algorithms which resort to tags from the personal tag vocabulary – Users who are motivated by description favor algorithms which suggest tags that are most descriptive for the resource she is tagging Hypertext 2010, June 15th, 2010 15
  • 16. TU Graz – Knowledge Management Institute Quantitative Evaluation / 2 1. Built basic recommenders which recommend tags based on personomy and folksonomy 2. Users were then evenly split between the two groups according to each measure 3. Hold out evaluation Predict which tags a user will assign to a resource based on her tagging motivation/behavior Hypertext 2010, June 15th, 2010 16
  • 17. TU Graz – Knowledge Management Institute Quantitative Evaluation / 3 According to the recommender evaluation two important findings were made: 0.6 Folksonomy−based Recommender Describer Users – Users who prefer folksonomy-based 0.5 recommendation (describers) can be best +38.9% Mean Average Precision 0.4 +32.3% +25.4% +26.7% +22.5% identified by a high Tag/Title intersection 0.3 ± 0.0% ratio 0.2 Personomy−based Recommender 0.1 Categorizer Users 0.6 0.0 +19.0% +17.5% Random Ratio Intersection Ratio Overlap Entropy Factor Orphaned Tags Conditional Tag Tag/Resource Tag/Title +11.9% +13.8% +14.0% 0.5 ± 0.0% Mean Average Precision 0.4 n behind 0.3 M otivatio has – Users who prefer personomy-based tagging act on 0.2 p recommendation (categorizers) can be signifi cant im ior! 0.1 best identified by a low Tag/Resource behav tagging Ratio 0.0 Random Ratio Intersection Ratio Overlap Entropy Factor Orphaned Tags Conditional Tag Tag/Resource Tag/Title Hypertext 2010, June 15th, 2010 17
  • 18. TU Graz – Knowledge Management Institute Outlook The findings can be a starting point for the analysis of questions such as: – How does motivation behind tagging influence the performance of current folksonomy search algorithms? – How can recommender explicitly consider tagging motivation to improve recommendation? – To what extent are existing algorithms for the acquisition of semantic relations from folksonomies affected by tagging motivation? Hypertext 2010, June 15th, 2010 18
  • 19. TU Graz – Knowledge Management Institute Conclusion Can tagging motivation be approximated by statistical measures? – Yes. This can be seen by the introduced measures. What are measures which enable the inference if a given user is a categorizer or a describer? – Conditional Tag Entropy, Orphaned Tags, Tag/Title intersection ratio etc. Which of these measures perform best to differentiate these two types of tagging motivation? – Tag/Resource Ratio has highest agreement with human judgement – Tag/Resource Ratio best predictor for categorization behavior – Tag/Title Intersection best predictor for description behavior Does the distinction of the proposed tagging motivation types have an influence on the tagging process? – Yes. It influences a user’s selection of tags. Hypertext 2010, June 15th, 2010 19
  • 20. TU Graz – Knowledge Management Institute References [Ames2007] Ames, M. & Naaman, M. (2007), Why we tag: motivations for annotation in mobile and online media, in ‘CHI ’07’: Proceedings of the SIGCHI conference on Human factors in computing systems’ ACM, New York, NY, USA, pp.971--980 [Cohen1960] Cohen, Jacob. (1960) A coefficient of agreement for nominal scales, Educational and Psychological Measurement Vol.20, No.1, pp. 37–46. [Heckner2009] Heckner, M; Heilemann, M. & Wolff, C. (2009) Personal Information Management vs. Resource Sharing: Towards a Model of Information Behavior in Social Tagging Systems, in ‘Int’l AAAI Conference on Weblogs and Social Media (ICWSM)’. [Körner2009] Körner, C. (2009), Understanding the Motivation Behind Tagging, Student Research Competition - Hypertext 2009 [Nov2009] Nov, O.; Naaman, M. & Ye, C. (2010), 'Analysis of participation in an online photo-sharing community: A multidimensional perspective.', JASIST 61(3), 555-566. Hypertext 2010, June 15th, 2010 20
  • 21. TU Graz – Knowledge Management Institute Thanks for you attention Questions? christian.koerner@tugraz.at A re you r a cat egorize or a des criber? Hypertext 2010, June 15th, 2010 21